Created attachment 64006 [details]
Example RTF document from Garant legal service
LibreOffice 126.96.36.199 (Final) Russian UI/Locale
on Russian Windows XP Professional SP3 (32bit)
Writer opens RTF document from Garant legal service, but instead of russian we see obscure characters. Changing font does not correct the situation. It is impossible to read text.
Original may be found at http://www.garant.ru/hotlaw/volga/402635/
(direct document URL at this page http://www.garant.ru/files/5/3/402635/402635.rtf)
Apache Open Office 3.4.0, Lotus Sumphony 3.0.1, Microsoft Office 2003 & 2007 open this file without any problems, everything is in russian
Created attachment 64009 [details]
Example 11 LibreOffice.jpg - how it looks like in LibreOffice 188.8.131.52
Another victim of incorrect default charset.
Not reproducible with 3.6rc1 on linux.
Timon, could you try 3.5.5 or 3.6rc1?
(In reply to comment #3)
> Not reproducible with 3.6rc1 on linux.
> Timon, could you try 3.5.5 or 3.6rc1?
In 3.5.5 with Windows XP SP3 still see obscure characters, bug not fixed.
Ok. I've installed 3.5.5 and can reproduce it with this version.
Therefore it was fixed after 3.5.5 (or probably fix wasn't pushed to 3-5 branch at all).
Miklos, would you be able to identify if this issue is fixed in 3.5 after 3.5.5?
Hm, I can still reproduce this bug with LibreOffice 184.108.40.206 (Build ID: 4db6344), German langpack installed, on MacOS X 10.6.8 (Intel) -- if I open it, it still looks like on Timon’s screenshot. I double-checked this, it’s really true!
Therefore, if this bug appears as fixed for Valek, I see two explanations:
a) The bug does not appear on Linux, only on Windows and MacOS X.
b) The bug does depend on the locale used --
I have German locale here, but which locale do you use, Valek? Maybe Russian,
and therefore a different default character set?
I need to set the status of this bug to NEW.
Yes, apparently it depends on system default codepage.
(In reply to comment #7)
> Yes, apparently it depends on system default codepage.
Seems so, and in addition, I can confirm that this bug is NOT a regression with the new RTF filter: on my machine (MacOS X 10.6.8 (Intel), German UI),
* LibreOffice 3.3.0
* LibreOffice 3.4.0
* LibreOffice 3.4.6
* Apache OpenOffice 3.4.0
open the document in question ALL with wrong text encoding (strange accented Latin characters instead of Russian/Cyrillic characters).
But still there's a general problem: the filters MUST ensure that Times New Roman CYR et al. were in 204 charset always, DESPITE whatever the file format says.
It depends on Options->Languages->Locale setting. If it is set to Russian, then the document is opened OK (regardless of Default language for documents: Western setting). Otherwise, it shows garbage.
And this is despite this specific document (unlike, say, Bug 48023) has all and every info that is needed to interpret it correctly: its \ansicpg control word is set to 1251; \deflang and \deflangfe are 1049; and even the font name contains "Cyr" (though this latter should not mean something special to the parser).
This problem seems to be caused by the fact that this document is invalid in that it contains the 8-bit characters. The standard requires that all not-7bit text to be encoded either as \'xx or as unicode \uxxx. Still, the standard requires reader software to be prepared to see 8-bit bytes (in binary blocks).
Seems like LO RTF importer only uses the RTF-defined locale settings when it sees properly encoded non-7bit characters; every plain character is internally converted to Unicode using the LO locale.
Created attachment 67223 [details]
Another test document
That's wrong. The bug caused by improper font handling when \fcharset is missing or invalid.
If the application handles it correctly, the attached document should contain "test" string.
NB: There is a bug in Wordpad, so it will need explicit \fcharset0 for font35 to open it correctly.
Well, then looks like the whole \ansicpg is ignored? I didn't think it's still the case... It's a serious flaw. A font only need to define its own charset if it's different from that of the document default codepage...
*** Bug 77770 has been marked as a duplicate of this bug. ***
(In reply to comment #0)
> Created attachment 64006 [details]
> Example RTF document from Garant legal service
test file bug still reproducible with LibO 220.127.116.11 Italian UI/Locale
\ansicpg is a legacy Wordpad thing and should not be relied upon.
Could not reproduce.
Windows Vista 64
Build ID: 88805f81e9fe61362df02b9941de8e38a9b5fd16
Bug present in 18.104.22.168.
Changed to RESOLVED WORKSFORME.