Created attachment 110758 [details]
RU and EL samples
Writer cannot open the Word 95 password-protected documents if the password contains national characters.
The passwords for attached documents are "пароль" and "παρασύνθημα".
Hey Urmas -
Are you getting a password incorrect error? I get this error and assume that it's the error you're seeing.
ubuntu 14.04 x64
LibreOffice 184.108.40.206 (updating version as version reflects oldest confirmed version)
Urmas: for the test, could you attach another Word 95 file with ascii password which works fine in LO?
ok no need to attach it.
There's a pb here:
5568 OUString sUniPassword = QueryPasswordForMedium( rMedium );
5570 OString sPassword(OUStringToOString(sUniPassword,
"pWwFib->chseTables" returns 0 so "WW8Fib::GetFIBCharset(pWwFib->chseTables)" which calls "rtl_getTextEncodingFromWindowsCharset" returns "RTL_TEXTENCODING_MS_1252" instead of "RTL_TEXTENCODING_MS_1251" (for Word95 file).
Ok Russian doc contains "secret" and the Greek one "Document text."
The problem is I don't know why "chseTables" is 0 which has been read in WW8Fib ctr, see http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/ww8scan.cxx#5178
BTW, I noticed too that WW8Fib ctr, so when reading stream, was a bit different when writing stream here:
eg: for "lid" and "pnNext", "ReadInt16" was used in ctr but "Set_UInt16" was used during writing, but that's another story.
I'm a bit stuck then :-(
Caolan/Miklos: any idea here?
Just forgot to tell that I could open the document when forcing GetFIBCharset in gdb with chseTables value to 204 for Russian and 161 for Greek so rtl_getTextEncodingFromWindowsCharset (http://opengrok.libreoffice.org/xref/core/sal/textenc/tencinfo.cxx#168) retrieves the right rtl_TextEncoding
It worth noting that MS Word is able to open both files regardless of system locale, so the password codepage is extracted from the file itself.
Leafing through Word (.doc) Binary File Format, see http://msdn.microsoft.com/en-us/library/office/cc313153%28v=office.14%29.aspx, I don't find "chse" or "chseTables" ref whereas it seems ok from wIdent until aVer8Bits1.
Urmas: I'm trying to understand a bit the doc but LO retrieves values (like "WW6") which would mean that both docs are from Winword6 (the version just before Winword95). Of course, perhaps this could be due to another bug or I just missed something (which is very plausible :-)) but just to be sure, do you confirm they're Winword95 files?
FIB version 104 means Word 95.
My remaining theory at this point is that if they are 0 then its a fallback to taking the lid/locale at offset 6 and take a matching encoding from that, i.e. the russian one has 0x419 (LANGUAGE_RUSSIAN) and the other one has 0x0408 (LANGUAGE_GREEK)
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":
Resolves: fdo#87248 assume 0 chse means use encoding that matches lid
It will be available in 4.5.0.
The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
Well, that "works" for these examples anyway