Bug 87248 - FILEOPEN Word 95 non-ASCII password not recognized
Summary: FILEOPEN Word 95 non-ASCII password not recognized
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.3.4.1 release
Hardware: All All
: medium normal
Assignee: Caolán McNamara
URL:
Whiteboard: target:4.5.0
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-11 22:05 UTC by Urmas
Modified: 2014-12-15 13:27 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
RU and EL samples (3.67 KB, application/x-zip)
2014-12-11 22:05 UTC, Urmas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Urmas 2014-12-11 22:05:09 UTC
Created attachment 110758 [details]
RU and EL samples

Writer cannot open the Word 95 password-protected documents if the password contains national characters.
The passwords for attached documents are "пароль" and "παρασύνθημα".
Comment 1 Joel Madero 2014-12-12 05:10:56 UTC
Hey Urmas - 

Are you getting a password incorrect error? I get this error and assume that it's the error you're seeing.

Confirmed on:
ubuntu 14.04 x64
LibreOffice 4.3.4.1 (updating version as version reflects oldest confirmed version)

New
Comment 2 Urmas 2014-12-12 05:44:21 UTC
Obviously.
Comment 3 Julien Nabet 2014-12-14 10:24:17 UTC
Urmas: for the test, could you attach another Word 95 file with ascii password which works fine in LO?
Comment 4 Julien Nabet 2014-12-14 10:36:08 UTC
ok no need to attach it.
There's a pb here:
   5568             OUString sUniPassword = QueryPasswordForMedium( rMedium );
   5569 
   5570             OString sPassword(OUStringToOString(sUniPassword,
   5571                 WW8Fib::GetFIBCharset(pWwFib->chseTables)));
http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/ww8par.cxx#5568

"pWwFib->chseTables" returns 0 so "WW8Fib::GetFIBCharset(pWwFib->chseTables)" which calls "rtl_getTextEncodingFromWindowsCharset" returns "RTL_TEXTENCODING_MS_1252" instead of "RTL_TEXTENCODING_MS_1251" (for Word95 file).
Comment 5 Julien Nabet 2014-12-14 11:55:14 UTC
Ok Russian doc contains "secret" and the Greek one "Document text."

The problem is I don't know why "chseTables" is 0 which has been read in WW8Fib ctr, see http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/ww8scan.cxx#5178
BTW, I noticed too that WW8Fib ctr, so when reading stream, was a bit different when writing stream here:
http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/ww8scan.cxx#5633
eg: for "lid" and "pnNext", "ReadInt16" was used in ctr but "Set_UInt16" was used during writing, but that's another story.
 
I'm a bit stuck then :-(

Caolan/Miklos: any idea here?
Comment 6 Julien Nabet 2014-12-14 12:07:11 UTC
Just forgot to tell that I could open the document when forcing GetFIBCharset in gdb with chseTables value to 204 for Russian and 161 for Greek so rtl_getTextEncodingFromWindowsCharset (http://opengrok.libreoffice.org/xref/core/sal/textenc/tencinfo.cxx#168) retrieves the right rtl_TextEncoding
Comment 7 Urmas 2014-12-14 12:19:21 UTC
It worth noting that MS Word is able to open both files regardless of system locale, so the password codepage is extracted from the file itself.
Comment 8 Julien Nabet 2014-12-14 13:36:33 UTC
Leafing through Word (.doc) Binary File Format, see http://msdn.microsoft.com/en-us/library/office/cc313153%28v=office.14%29.aspx, I don't find "chse" or "chseTables" ref whereas it seems ok from wIdent until aVer8Bits1.
Comment 9 Julien Nabet 2014-12-14 21:19:30 UTC
Urmas: I'm trying to understand a bit the doc but LO retrieves values (like "WW6") which would mean that both docs are from Winword6 (the version just before Winword95). Of course, perhaps this could be due to another bug or I just missed something (which is very plausible :-)) but just to be sure, do you confirm they're Winword95 files?
Comment 10 Urmas 2014-12-15 00:51:10 UTC
FIB version 104 means Word 95.
Comment 11 Caolán McNamara 2014-12-15 11:45:40 UTC
My remaining theory at this point is that if they are 0 then its a fallback to taking the lid/locale at offset 6 and take a matching encoding from that, i.e. the russian one has 0x419 (LANGUAGE_RUSSIAN) and the other one has 0x0408 (LANGUAGE_GREEK)
Comment 12 Commit Notification 2014-12-15 12:31:39 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c96e8a174f915e46b0f0814271e53938d8f07373

Resolves: fdo#87248 assume 0 chse means use encoding that matches lid

It will be available in 4.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 13 Caolán McNamara 2014-12-15 13:27:01 UTC
Well, that "works" for these examples anyway