Bug 51898 - Writer instead of russian show obscure characters in RTF document from Garant legal service
Summary: Writer instead of russian show obscure characters in RTF document from Garant...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.4 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-07-09 08:06 UTC by Timon
Modified: 2015-05-18 12:58 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Example RTF document from Garant legal service (47.29 KB, application/rtf)
2012-07-09 08:06 UTC, Timon
Details
Example 11 LibreOffice.jpg - how it looks like in LibreOffice 3.5.4.2 (158.45 KB, image/jpeg)
2012-07-09 08:46 UTC, Timon
Details
Another test document (1.64 KB, application/msword)
2012-09-16 04:31 UTC, Urmas
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Timon 2012-07-09 08:06:49 UTC
Created attachment 64006 [details]
Example RTF document from Garant legal service

LibreOffice 3.5.4.2 (Final) Russian UI/Locale
Build-ID: 165a79a-7059095-e13bb37-fef39a4-9503d18
on Russian Windows XP Professional SP3 (32bit)

Writer opens RTF document from Garant legal service, but instead of russian we see obscure characters. Changing font does not correct the situation. It is impossible to read text.

Original may be found at http://www.garant.ru/hotlaw/volga/402635/
(direct document URL at this page http://www.garant.ru/files/5/3/402635/402635.rtf)

Apache Open Office 3.4.0, Lotus Sumphony 3.0.1, Microsoft Office 2003 & 2007 open this file without any problems, everything is in russian
Comment 1 Timon 2012-07-09 08:46:24 UTC
Created attachment 64009 [details]
Example 11 LibreOffice.jpg - how it looks like in LibreOffice 3.5.4.2
Comment 2 Urmas 2012-07-09 11:22:08 UTC
Another victim of incorrect default charset.
Comment 3 Valek Filippov 2012-07-15 14:43:43 UTC
Not reproducible with 3.6rc1 on linux.

Timon, could you try 3.5.5 or 3.6rc1?
Comment 4 Timon 2012-07-15 17:14:04 UTC
(In reply to comment #3)
> Not reproducible with 3.6rc1 on linux.
> 
> Timon, could you try 3.5.5 or 3.6rc1?

In 3.5.5 with Windows XP SP3 still see obscure characters, bug not fixed.
Comment 5 Valek Filippov 2012-07-15 17:46:04 UTC
Ok. I've installed 3.5.5 and can reproduce it with this version.
Therefore it was fixed after 3.5.5 (or probably fix wasn't pushed to 3-5 branch at all).

Miklos, would you be able to identify if this issue is fixed in 3.5 after 3.5.5?
Comment 6 Roman Eisele 2012-08-22 16:44:13 UTC
Hm, I can still reproduce this bug with LibreOffice 3.6.1.1 (Build ID: 4db6344), German langpack installed, on MacOS X 10.6.8 (Intel) -- if I open it, it still looks like on Timon’s screenshot. I double-checked this, it’s really true!

Therefore, if this bug appears as fixed for Valek, I see two explanations:
a) The bug does not appear on Linux, only on Windows and MacOS X.
b) The bug does depend on the locale used --
   I have German locale here, but which locale do you use, Valek? Maybe Russian,
   and therefore a different default character set?

I need to set the status of this bug to NEW.
Comment 7 Urmas 2012-08-23 02:08:33 UTC
Yes, apparently it depends on system default codepage.
Comment 8 Roman Eisele 2012-08-23 10:08:08 UTC
(In reply to comment #7)
> Yes, apparently it depends on system default codepage.

Seems so, and in addition, I can confirm that this bug is NOT a regression with the new RTF filter: on my machine (MacOS X 10.6.8 (Intel), German UI),
 * LibreOffice 3.3.0
 * LibreOffice 3.4.0
 * LibreOffice 3.4.6
 * Apache OpenOffice 3.4.0
open the document in question ALL with wrong text encoding (strange accented Latin characters instead of Russian/Cyrillic characters).
Comment 9 Urmas 2012-08-23 11:33:47 UTC
But still there's a general problem: the filters MUST ensure that Times New Roman CYR et al. were in 204 charset always, DESPITE whatever the file format says.
Comment 10 Mike Kaganski 2012-09-15 11:56:59 UTC
It depends on Options->Languages->Locale setting. If it is set to Russian, then the document is opened OK (regardless of Default language for documents: Western setting). Otherwise, it shows garbage.

And this is despite this specific document (unlike, say, Bug 48023) has all and every info that is needed to interpret it correctly: its \ansicpg control word is set to 1251; \deflang and \deflangfe are 1049; and even the font name contains "Cyr" (though this latter should not mean something special to the parser).

This problem seems to be caused by the fact that this document is invalid in that it contains the 8-bit characters. The standard requires that all not-7bit text to be encoded either as \'xx or as unicode \uxxx. Still, the standard requires reader software to be prepared to see 8-bit bytes (in binary blocks).

Seems like LO RTF importer only uses the RTF-defined locale settings when it sees properly encoded non-7bit characters; every plain character is internally converted to Unicode using the LO locale.
Comment 11 Urmas 2012-09-16 04:31:21 UTC
Created attachment 67223 [details]
Another test document

That's wrong. The bug caused by improper font handling when \fcharset is missing or invalid. 

If the application handles it correctly, the attached document should contain "test" string.

NB: There is a bug in Wordpad, so it will need explicit \fcharset0 for font35 to open it correctly.
Comment 12 Mike Kaganski 2012-09-16 11:48:46 UTC
Well, then looks like the whole \ansicpg is ignored? I didn't think it's still the case... It's a serious flaw. A font only need to define its own charset if it's different from that of the document default codepage...
Comment 13 tommy27 2014-04-23 05:44:43 UTC
*** Bug 77770 has been marked as a duplicate of this bug. ***
Comment 14 tommy27 2014-04-23 05:46:16 UTC
(In reply to comment #0)
> Created attachment 64006 [details]
> Example RTF document from Garant legal service
> 

test file bug still reproducible with LibO 4.2.3.3 Italian UI/Locale
under Win7x64
Comment 15 Urmas 2014-04-26 16:54:22 UTC
\ansicpg is a legacy Wordpad thing and should not be relied upon.
Comment 16 Gordo 2015-05-18 12:58:35 UTC
Could not reproduce.

Windows Vista 64
Version: 4.4.3.2
Build ID: 88805f81e9fe61362df02b9941de8e38a9b5fd16

4.3.6.2

Bug present in 4.2.8.2.

Changed to RESOLVED WORKSFORME.