Created attachment 59657 [details] Test file showing this behaviour When an RTF document contains a /ansigpgN control word in the header just after /ansi control word, a reader should use this code page to perform ansi-to-Unicode conversion wherever another codepage isn't specified for a text run and Unicode RTF isn't used[1]. When a font definition contains /fcharsetN control word, it overrides the top-level setting, and when there is a /cpgN, it overrides both top-level setting and /fcharsetN [2]. Now, when opening an RTF which doesn't contain any codepage/charset data, LO defaults to Latin-1 (see Bug 48023). If such document contains /ansicpgN, of its fonts have /cpgN, LO ignores this information, and still uses Latin-1. Only /fcharsetN is taken into account. The attachment is the test document from Bug 48023, where the missing language information is manually added. There is /ansicpg1251 in the header now, as well as /fcharset204 in one font, and /cpg1251 in another. It may be seen, that only the text using the first font is displayed properly. As to documents that don't contain language information at all (and there is a great number of such documents generated by various non-MS software out there), I believe that LO should use user language (and provide a means of specifying another on opening, like a checkbox in Open dialog saying "Specify missing charset" doing something similar to Text Encoded filter). -- 1. Word 2007: Rich Text Format (RTF) Specification, version 1.9.1 (http://www.microsoft.com/download/en/details.aspx?id=10725), page 12: Character Set 2. Ibid., pages 17-20.
Also note that some RTF software stores font character set incorrectly as ANSI_CHARSET for some national fonts. At least, the standard Windows fonts ({Times New Roman|Arial|Courier New}[ CE| Cyr], Japanese, Chinese (Simplified and Traditional), Korean and Thai) should have that field corrected to ensure proper import. Same problem may exist for Arabic/Hebrew documents, which may contain legacy charset values. Ideally there should be a way for user to provide a font mapping table to define a proper charset for custom fonts people could have used.
@Mike: please could you attach a pdf export of your test file which would show how it should look like when opened in LibreOffice ? Best regards. JBF
Created attachment 61096 [details] This is how it renders now - only one piece is shown correctly
Created attachment 61097 [details] This is how it should be. I'm not sure if assigning it to me is a right thing to do...
(In reply to comment #4) > Created attachment 61097 [details] > This is how it should be. Thank you very for the data Hi Miklos, another codepage problem in RTF import. Please, feel free to reassign if you can't handle this bug. Best regards. JBF
Mike, Thanks for the detailed report. Funny, your test document in Word matches your "how it renders now" PDF, at least here, with an English UI. ;-) Since 3.5.2, we already implemented locale-dependent default (so your testdoc opens fine already if the locale is set to Russian), and also \ansicpg got implemented. And you're right: with the \cpg implementation, the LibreOffice result matches the "how it should be", even with English UI. I'll push that patch in a bit. Miklos
Miklos Vajna committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=f6a24ace5ad12e79f0cc90709a290a30e3758781 fdo#48446 implement RTF_CPG
Resolved in master, -3-6 and -3-5 review requests: https://gerrit.libreoffice.org/386 https://gerrit.libreoffice.org/387
Miklos Vajna committed a patch related to this issue. It has been pushed to "libreoffice-3-6": http://cgit.freedesktop.org/libreoffice/core/commit/?id=8054472f666c87d6437dcea064c3cef379916245&g=libreoffice-3-6 fdo#48446 implement RTF_CPG It will be available in LibreOffice 3.6.1.
Miklos Vajna committed a patch related to this issue. It has been pushed to "libreoffice-3-5": http://cgit.freedesktop.org/libreoffice/core/commit/?id=98e895db332446b3fe2fc901a6cf9cff64d2b1b8&g=libreoffice-3-5 fdo#48446 implement RTF_CPG It will be available in LibreOffice 3.5.7.
Migrating Whiteboard tags to Keywords: (filter:rtf) Replace rtf_filter -> filter:rtf. [NinjaEdit]