87248 – FILEOPEN Word 95 non-ASCII password not recognized

Bug 87248 - FILEOPEN Word 95 non-ASCII password not recognized

Summary: FILEOPEN Word 95 non-ASCII password not recognized

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	filters and storage (show other bugs)
Version: (earliest affected)	4.3.4.1 release
Hardware:	All All

Importance:	medium normal
Assignee:	Caolán McNamara

URL:
Whiteboard:	target:4.5.0
Keywords:

Depends on:
Blocks:

Reported:	2014-12-11 22:05 UTC by Urmas
Modified:	2014-12-15 13:27 UTC (History)
CC List:	4 users (show)

See Also:
Crash report or crash signature:

Attachments
RU and EL samples (3.67 KB, application/x-zip) 2014-12-11 22:05 UTC, Urmas	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Urmas 2014-12-11 22:05:09 UTC

Created attachment 110758 [details]
RU and EL samples

Writer cannot open the Word 95 password-protected documents if the password contains national characters.
The passwords for attached documents are "пароль" and "παρασύνθημα".

Comment 1 Joel Madero 2014-12-12 05:10:56 UTC

Hey Urmas - 

Are you getting a password incorrect error? I get this error and assume that it's the error you're seeing.

Confirmed on:
ubuntu 14.04 x64
LibreOffice 4.3.4.1 (updating version as version reflects oldest confirmed version)

New

Comment 2 Urmas 2014-12-12 05:44:21 UTC

Obviously.

Comment 3 Julien Nabet 2014-12-14 10:24:17 UTC

Urmas: for the test, could you attach another Word 95 file with ascii password which works fine in LO?

Comment 4 Julien Nabet 2014-12-14 10:36:08 UTC

ok no need to attach it.
There's a pb here:
   5568             OUString sUniPassword = QueryPasswordForMedium( rMedium );
   5569 
   5570             OString sPassword(OUStringToOString(sUniPassword,
   5571                 WW8Fib::GetFIBCharset(pWwFib->chseTables)));
http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/ww8par.cxx#5568

"pWwFib->chseTables" returns 0 so "WW8Fib::GetFIBCharset(pWwFib->chseTables)" which calls "rtl_getTextEncodingFromWindowsCharset" returns "RTL_TEXTENCODING_MS_1252" instead of "RTL_TEXTENCODING_MS_1251" (for Word95 file).

Comment 5 Julien Nabet 2014-12-14 11:55:14 UTC

Ok Russian doc contains "secret" and the Greek one "Document text."

The problem is I don't know why "chseTables" is 0 which has been read in WW8Fib ctr, see http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/ww8scan.cxx#5178
BTW, I noticed too that WW8Fib ctr, so when reading stream, was a bit different when writing stream here:
http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/ww8scan.cxx#5633
eg: for "lid" and "pnNext", "ReadInt16" was used in ctr but "Set_UInt16" was used during writing, but that's another story.
 
I'm a bit stuck then :-(

Caolan/Miklos: any idea here?

Comment 6 Julien Nabet 2014-12-14 12:07:11 UTC

Just forgot to tell that I could open the document when forcing GetFIBCharset in gdb with chseTables value to 204 for Russian and 161 for Greek so rtl_getTextEncodingFromWindowsCharset (http://opengrok.libreoffice.org/xref/core/sal/textenc/tencinfo.cxx#168) retrieves the right rtl_TextEncoding

Comment 7 Urmas 2014-12-14 12:19:21 UTC

It worth noting that MS Word is able to open both files regardless of system locale, so the password codepage is extracted from the file itself.

Comment 8 Julien Nabet 2014-12-14 13:36:33 UTC

Leafing through Word (.doc) Binary File Format, see http://msdn.microsoft.com/en-us/library/office/cc313153%28v=office.14%29.aspx, I don't find "chse" or "chseTables" ref whereas it seems ok from wIdent until aVer8Bits1.

Comment 9 Julien Nabet 2014-12-14 21:19:30 UTC

Urmas: I'm trying to understand a bit the doc but LO retrieves values (like "WW6") which would mean that both docs are from Winword6 (the version just before Winword95). Of course, perhaps this could be due to another bug or I just missed something (which is very plausible :-)) but just to be sure, do you confirm they're Winword95 files?

Comment 10 Urmas 2014-12-15 00:51:10 UTC

FIB version 104 means Word 95.

Comment 11 Caolán McNamara 2014-12-15 11:45:40 UTC

My remaining theory at this point is that if they are 0 then its a fallback to taking the lid/locale at offset 6 and take a matching encoding from that, i.e. the russian one has 0x419 (LANGUAGE_RUSSIAN) and the other one has 0x0408 (LANGUAGE_GREEK)

Comment 12 Commit Notification 2014-12-15 12:31:39 UTC

Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=c96e8a174f915e46b0f0814271e53938d8f07373

Resolves: fdo#87248 assume 0 chse means use encoding that matches lid

It will be available in 4.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.

Comment 13 Caolán McNamara 2014-12-15 13:27:01 UTC

Well, that "works" for these examples anyway