Created attachment 50996 [details] Example-text as odt and rtf Using the german version auf LibreOffice 4.3.4 the following error occured with several texts while converting via "save as" from odt-text to rtf-text: If the odt-text includes german umlaute (äöüÄÖÜ) or the letter ß (=sz) the rtf-text results in wrong formating of these letters. The font-style changes e.g. from Times New Roman or Arial to SimSun. (In other texts to Univers, up to now no rule identifiable.)
Created attachment 54565 [details] ODT-RTF-Konverter-Test2.7z No problem for me under Ubuntu 10.04 x86_64 with LO 3.4.4 and LO 3.5.0 beta-1 @Schneider: please could you try again with current release (3.4.4) and, if possible, with LO 3.5.0 beta-1 ? Best regards. JBF
I repeated the test with LibreOffice 3.4.4 and 3.5.0 beta-1 (both German-version an a pc with Windows XP-Pro). Sorry, but the results are each time the very same as with LibreOffice 3.4.3. The German umlaute äöüÄÖÜ, the letter ß (=sz) and French characters with diacritics (e.g. "çéî") changed font style from "Arial" to "SimSun", when I imported the rtf-files with Microsoft Word (Version XP, 2002). It seemed to be an erroneous constellation: LibreOffice produces a complicated rtf-code with an error coding the appropriate language. The rtf-import-converters of Microsoft Word-XP (2002) and Word 2003 can handle this complicated code and show a text with chinese font-style. Extract from the rtf-file generated from LibreOffice 3.4.4 (complicated and erroneous code): =========================================== German Umlaute: \'e4\'f6\'fc \'c4\'d6\'dc } \par \pard\plain \s0\nowidctlpar{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\aspalpha\ltr par\cf0\kerning1\hich\af4\langfe2052\dbch\af5\afs24\lang1081\loch\f0\f s24\lang1031{\rtlch \ltrch\loch\loch\f2 sz = \'df } \par \pard\plain \s0\nowidctlpar{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\aspalpha\ltr par\cf0\kerning1\hich\af4\langfe2052\dbch\af5\afs24\lang1081\loch\f0\f s24\lang1031{\rtlch \ltrch\loch\loch\f2 French letters: }{\rtlch \ltrch\loch\loch\f3 \'ab}{\rtlch \ltrch\loch\loch\f2 Deux caf\'e9 s\'91il vour pla}{\rtlch \ltrch\loch\loch\f3 \'ee}{\rtlch \ltrch\loch\loch\f2 t!}{\rtlch \ltrch\loch\loch\f3 \'bb dit }{\rtlch \ltrch\loch\loch\f2 Fran}{\rtlch \ltrch\loch\loch\f3 \'e7}{\rtlch \ltrch\loch\loch\f2 oir. =========================================== The rtf-language-code "\langfe2052" above means "Chinese (simplified)". This is not intended by the test-text. Most rtf-import-converters seem to ignore the chinese language code and show the rtf-text without any errors. (An import of the exported rtf-files with any version of LibreOffice-Writer and with Microsoft WordPad [version 5.1] and with SoftMaker-Textmaker-Viewer [version 2010] resulted in correct font-styles.) The Microsoft rtf-import-converters seemed to try to take the rtf-code literally and failed in showing the intended font-style. Microsoft-Word-XP itself produces a very different rtf-file (from a doc-file-source). It is in the relevant part far more simple with no language-code-switch at all. Extract of the very same text-part from the rtf-file generated from Microsoft Word-XP (simple code): =========================================== German Umlaute: \'e4 \'f6\'fc \'c4\'d6\'dc \par sz = \'df \par French letters: \'abDeux caf\'e9 s\lquote il vour pla\'eet!\'bb dit Fran\'e7oir. \par =========================================== See the attached 7z-file. Thank you for exploring the bug. Schneider Am 18 Dec 2011 um 15:16 hat bugzilla-daemon@freedesktop.org geschrieben: > https://bugs.freedesktop.org/show_bug.cgi?id=40735 > > Jean-Baptiste Faure <jbf.faure@orange.fr> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Status|NEW |NEEDINFO > CC| |jbf.faure@orange.fr > > --- Comment #1 from Jean-Baptiste Faure <jbf.faure@orange.fr> 2011-12-18 07:16:36 PST --- > No problem for me under Ubuntu 10.04 x86_64 with LO 3.4.4 and LO 3.5.0 beta-1 > > @Schneider: please could you try again with current release (3.4.4) and, if > possible, with LO 3.5.0 beta-1 ? > > Best regards. JBF > > -- > Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email > ------- You are receiving this mail because: ------- > You reported the bug. ****************************************** Schneider, Uni-Dortmund, Fak 15, AV-Labor E-Mail: HIAT@post.Uni-Dortmund.de ******************************************
Sorry, I did not understand that it was when you opened the RTF file, produced by LO, in MS-Word, that you had a problem. The RTF file is OK in LO, will try with MS-Word when I will have found one. Side note: your file opened in Abiword 2.8.2 looks like in LO. Cedric: please have a look at this bug, it may contain interesting informations for the new RTF filter. Feel free to reassign if you want. Best regards. JBF
Created attachment 55555 [details] LibO file opened with Word 2007 I confirm the error. The RTF output is erroneous and complicated.
Sorry, I forgot to mention: tested with LibO 3.4.4. LibO 3.4 Beta2 seems fine.
I correct: LibO 3.5 Beta2 seem fine.
I am very sorry, bad day! I loaded by mistake the ODT file, not the RTF file. The error is in both LibO 3.4.4 and LibO 3.5 Beta2. I looked at the RTF code too, to be sure.
Created attachment 55565 [details] manually corrected files which were attached here Hello, perhaps I can give a hint or even a solution, but I am in no way an expert so please look at it and compare my code with the original code. I manually corrected the files which were attached here (in ODT-RTF-Konverter-Test*.7z) and tried them with Word 2007, LibO 3.4.4 and LibO 3.5 Beta2. They display correctly. The problems seems to be: 1) Style is defined in the style sheet, p.ex. \s0. Then it is used and basically the whole definition is repeated for each paragraph. I don't know if it is for some compatibility reason, but simply \s0 should be OK. If all paragraphs in the text are the same standard paragraphs, it can even be omitted. I don't know what causes the problem, because as I said, the style is apparently redefined with the same values. 2) Possibly erroneous use of \loch and \hich. In the second file, I simply lost track of things and used unicode if necessary: {\f2\fs24 French letters: \'abDeux caf\'e9 s\u8216 ?il vour pla\'eet!\'bb dit Fran\'e7oir. } The original code which uses \loch and \hich is rather horrible. I think even Word 97 can interpret unicode encoding, is it really necessary for compatibility reason not to use unicode? The source is much more readable. I hope I am more helpful today than I was yesterday. Thanks.
Created attachment 55680 [details] 5 very simple rtf files which demonstrate the umlaut formatting problem and the solution (hand-coded by myself) 1. exampJapWord2003RTFSpecError.rtf (file from MS Word 2003 RTF Specification, page 140, with added umlauts to demonstrate the problem) 2. exampJapWord2003RTFSpecOK.rtf (file corrected to work with umlauts) 3. exampleRtfError.rtf (problem: umlauts and greek text not properly formatted) 4. exampleRtfOK.rtf (solution with \hich \loch) 5. exampleRtfOKCoded.rtf (solution with \hich only) Solution 1: {\loch\f0 umlauts and greek: } {\hich\f0 \'e4 \'fc \'f6 \'df \u917 ?\u965 ?\u967 ?\u945 ?\u961 ?\u953 ?\u963 ?\u964 ?\u974 ?} {\loch\f2 - Arial, no umlauts} Characters in the range 0-127 have to be formatted with \loch, characters >127 with \hich (range 128-255 can be encoded in hex or unicode, characters >255 have to be encoded in unicode). Solution 2: {\hich <the whole text, even the text area from 0-127, that is even blanks p.ex, must be coded in hex or unicode>} Drawback: the text (Basis Latin) isn't human-readable anymore. Both solutions successfully tested with LibO 3.4.4, LibO 3.5 Beta3, MS Word 2003, and MS Word 2007 on Windows XP and Vista 64. The MS Word 2003 RTF Specification (=RTF 1.8)(Associated Character Properties) is rather vague, but it seems that \loch works only in the range 0-127, and \hich works in the region >127. The documentation says that \hich works only from 128-255, but it seems to work with unicode too). RTF 1.5, RTF 1.7, RTF 1.9 : basically the same text. Solution 3: don't use any associated character properties if not necessary and produce much simpler RTF code, otherwise use solution 1 - would be probably complicated to code.
@ Rainer My be this bug is locale specific. Please, try reproduce this bug. Steps to reproduce: 1. Open odt file from initial attachment 2. save as rtf 3. open rtf by MS office and verify umlauts Thanks
\loch and \hich are needed for hieroglyphic support. Also the bug is present in 3.5.0.
That's a lot of stuff. [Reproducible] with reporter's first sample "ODT-RFT-Konverter-Test.odt" and "LibreOffice Daly based on 3.4.2 RC - WIN7 Home Premium (64bit) German UI [OOO340m1 (Build:201) from libreoffice-3-4~2011-07-22_15.35.00_LibO_3.4.2rc1_Win_x86_install_multi.exe]" from 2011-07-23 [Reproducible] with "LibreOffice 3.5.1.1 German UI/Locale [Build-ID: 45a2874-aa8c38d-dff3b9c-def3dbd-62463c8] on German WIN7 Home Premium (64bit) But I CAN NOT reproduce with own texts, even when I copy / Paste special as plain text contents from "ODT-RFT-Konverter-Test.odt" to a new WRITER document in 3.4.2 or 3.5.1RC everything looks fine in export.rtf. @s-joyemusequna@vf.uni-konstanz.de, @Urmas: I agree with sasha's thoughts, I prefer to find out why this only happens only under particular circumstances before we discuss a solution. Do you have any idea why the problem is reproducible for me with reporter's sample, but not with self typed texts? @Miklós I believe you might be the more appropriate expert for this problem. Do you already have an idea what the reasons might be or do you need additional research, may be start from the roots, try a parallel server installation of 3.4.5 with it's own user profile (<https://wiki.documentfoundation.org/Installing_in_parallel>), ...
@Rainer Bielefeld It does not happen only under particular circumstances. It happens consistently in all cases that I tried, with all self-typed texts (tested With Vista 64, LibO 3.4.5). But: the problem is only visible, if you open the rtf file with MS Word (tested with Word 2003 and Word 2007). It opens fine with all versions of LibreOffice, and with AbiWord 2.9.2. With WordPad it looks OK (but it is not - if you mark all the text supposed to be Arial and format it with Arial, you see the same error as with Word). The RTF code simply is not OK, but LibreOffice and AbiWord accept it as correct.
(In reply to comment #13) > It does not happen only under particular circumstances. Your settings, profile and and and also are a "particular circumstance". If I find the time I will try with WIN XP tomorrow. > But: the problem is only visible, if you open the rtf file with MS Word (tested > with Word 2003 and Word 2007). I should have mentioned: due to the comments, of course I checked all exported.rtf with MS WORD Viewer. > It opens fine with all versions of LibreOffice, and with AbiWord 2.9.2. With > WordPad it looks OK (but it is not - if you mark all the text supposed to be > Arial and format it with Arial, you see the same error as with Word). That is is a different problem. Of course there might be common roots, but our research here please should be limited to the FILESAVE problem. For the FILEOPEN problem (what I also can confirm) IMHO a separate bug should be submitted. > The RTF code simply is not OK, but LibreOffice and AbiWord accept it as > correct. As mentioned, that is not a FILESAVE problem and deserves a separate Bug.
Created attachment 57896 [details] Test kit Results in my testkit when open documents wiht MS WORD Viewer a) reported problem remains visible when I copy / paste special as plain text to reporter's sample document and save as .rtf b) no problem visible when I copy / paste special as plain text to a new document and save as .rtf c) Problem also visible when save original document as WORD6 d) no problem visible when save original document as WORD97 May be someone with better knowledge than mine can find out the differences between "originaldocumentcontentspasteasplaintext.odt" and "newdocumentcontentspasteasplaintext.odt" causing the different rtf export? I Submitted "Bug 46864 - FILEOPEN particular .RTF does not show different character styles in a line".
The FILESAVE problem is not visible with OOo 3.3., my first observation is with reporter's sample and "LibreOffice Portable 3.3.0 - WIN7 Home Premium (64bit) German UI [OOO330m19 (Build:6) tag libreoffice-3.3.0.4]". The different date of appearance (compared to Bug 46864) seems to underpin my suspect that those 2 bugs are different. REGRESSION because worked in last OOo version before LibO started.
Miklos Vajna committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=8836b45de536a3a2fd72533c3210e439bc2fbca1 fdo#40735 RTF export: CJK text is typically not single-byte
s-joyemusequna, Repeating the contents of styles indeed makes the RTF output ugly, but that's needed. Dropping support for readers not supporting styles would be a regression. Schneider, To my understanding, RTF uses the \hich, \loch and \dbch control words to handle legacy non-unicode and non-ASCII text. The orginal bugdoc had German accents, so the \hich part was applied, even if the German text was obviously not CJK text. The above oneliner fix changes the CJK text to use \dbch as well. Now I see the correct font name in Word as well. Miklos
Created attachment 103187 [details] ODT-RFT-Konverter-Test.odt converted to RTF with LibreOffice 4.2.5
Created attachment 103188 [details] ODT-RFT-Konverter-Test-created-with-LibreOffice-4.2.5.odt and .rtf
I converted ODT-RFT-Konverter-Test.odt attached in "Example-text as odt and rtf" to RTF with LibreOffice 4.2.5, and opened the RTF document with Microsoft Word 2010. "Times New Roman" instead of "Arial" was used for the second occurrence of "äöüß" and "ÄÖÜ". (See attachment "ODT-RFT-Konverter-Test.odt converted to RTF with LibreOffice 4.2.5") Then I created a similar document with LibreOffice 4.2.5, converted it to RTF, and opened the RTF document with Microsoft Word 2010. "Liberation Serif" instead of "Times New Roman" was used for the first occurrence of "äöüß" and "ÄÖÜ", and "Liberation Serif" instead of "Arial" was used for the second occurrence of "äöüß" and "ÄÖÜ". (See attachment "ODT-RFT-Konverter-Test-created-with-LibreOffice-4.2.5.odt and .rtf")
Removing comma from whiteboard (please use a space to delimit values in this field) https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Whiteboard#Getting_Started
Hi Igor, This bug was fixed more than two years ago. If you found a similar issue, please open a new bug for your problem, don't reopen an ancient one. Thanks!
Migrating Whiteboard tags to Keywords: ( rtf_filter -> filter:rtf) [NinjaEdit]