Bug 40735 - FILESAVE to RTF causes wrong font-style
Summary: FILESAVE to RTF causes wrong font-style
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: x86 (IA32) Windows (All)
: medium normal
Assignee: Miklos Vajna
URL:
Whiteboard: bibisected35 bibisected35older target...
Keywords: filter:rtf, regression
Depends on:
Blocks:
 
Reported: 2011-09-09 03:36 UTC by Schneider
Modified: 2015-12-15 11:53 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Example-text as odt and rtf (9.98 KB, application/x-7z-compressed)
2011-09-09 03:36 UTC, Schneider
Details
ODT-RTF-Konverter-Test2.7z (17.87 KB, application/x-7z-compressed)
2011-12-18 17:23 UTC, Schneider
Details
LibO file opened with Word 2007 (3.25 KB, image/png)
2012-01-13 07:56 UTC, s-joyemusequna
Details
manually corrected files which were attached here (1.31 KB, application/x-7z-compressed)
2012-01-14 01:46 UTC, s-joyemusequna
Details
5 very simple rtf files which demonstrate the umlaut formatting problem and the solution (hand-coded by myself) (859 bytes, application/x-7z-compressed)
2012-01-17 08:16 UTC, s-joyemusequna
Details
Test kit (21.92 KB, application/zip)
2012-03-02 00:13 UTC, Rainer Bielefeld Retired
Details
ODT-RFT-Konverter-Test.odt converted to RTF with LibreOffice 4.2.5 (3.93 KB, application/octet-stream)
2014-07-21 12:18 UTC, I.I.
Details
ODT-RFT-Konverter-Test-created-with-LibreOffice-4.2.5.odt and .rtf (16.07 KB, application/octet-stream)
2014-07-21 12:22 UTC, I.I.
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Schneider 2011-09-09 03:36:41 UTC
Created attachment 50996 [details]
Example-text as odt and rtf

Using the german version auf LibreOffice 4.3.4 the following error occured with several texts while converting via "save as" from odt-text to rtf-text:

If the odt-text includes german umlaute (äöüÄÖÜ) or the letter ß (=sz) the rtf-text results in wrong formating of these letters. The font-style changes e.g. from Times New Roman or Arial to SimSun. (In other texts to Univers, up to now no rule identifiable.)
Comment 1 Jean-Baptiste Faure 2011-12-18 07:16:36 UTC
Created attachment 54565 [details]
ODT-RTF-Konverter-Test2.7z

No problem for me under Ubuntu 10.04 x86_64 with LO 3.4.4 and LO 3.5.0 beta-1

@Schneider: please could you try again with current release (3.4.4) and, if possible, with LO 3.5.0 beta-1 ?

Best regards. JBF
Comment 2 Schneider 2011-12-18 17:23:14 UTC
I repeated the test with LibreOffice 3.4.4 and 3.5.0 beta-1 (both 
German-version an a pc with Windows XP-Pro). Sorry, but the results 
are each time the very same as with LibreOffice 3.4.3. 

The German umlaute äöüÄÖÜ, the letter ß (=sz) and French characters 
with diacritics (e.g. "çéî") changed font style from "Arial" to 
"SimSun", when I imported the rtf-files with Microsoft Word (Version 
XP, 2002). 

It seemed to be an erroneous constellation: LibreOffice produces a 
complicated rtf-code with an error coding the appropriate language. 
The rtf-import-converters of Microsoft Word-XP (2002) and Word 2003 
can handle this complicated code and show a text with chinese 
font-style. 

Extract from the rtf-file generated from LibreOffice 3.4.4 
(complicated and erroneous code):
===========================================
German Umlaute: \'e4\'f6\'fc \'c4\'d6\'dc }
\par \pard\plain 
\s0\nowidctlpar{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\aspalpha\ltr
par\cf0\kerning1\hich\af4\langfe2052\dbch\af5\afs24\lang1081\loch\f0\f
s24\lang1031{\rtlch \ltrch\loch\loch\f2
sz = \'df }
\par \pard\plain 
\s0\nowidctlpar{\*\hyphen2\hyphlead2\hyphtrail2\hyphmax0}\aspalpha\ltr
par\cf0\kerning1\hich\af4\langfe2052\dbch\af5\afs24\lang1081\loch\f0\f
s24\lang1031{\rtlch \ltrch\loch\loch\f2
French letters: }{\rtlch \ltrch\loch\loch\f3
\'ab}{\rtlch \ltrch\loch\loch\f2
Deux caf\'e9 s\'91il vour pla}{\rtlch \ltrch\loch\loch\f3
\'ee}{\rtlch \ltrch\loch\loch\f2
t!}{\rtlch \ltrch\loch\loch\f3
\'bb dit }{\rtlch \ltrch\loch\loch\f2
Fran}{\rtlch \ltrch\loch\loch\f3
\'e7}{\rtlch \ltrch\loch\loch\f2
oir.
===========================================
The rtf-language-code "\langfe2052" above means "Chinese 
(simplified)". This is not intended by the test-text. 

Most rtf-import-converters seem to ignore the chinese language code 
and show the rtf-text without any errors. (An import of the exported 
rtf-files with any version of LibreOffice-Writer and with Microsoft 
WordPad [version 5.1] and with SoftMaker-Textmaker-Viewer [version 
2010] resulted in correct font-styles.) 
The Microsoft rtf-import-converters seemed to try to take the 
rtf-code literally and failed in showing the intended font-style.

Microsoft-Word-XP itself produces a very different rtf-file (from a 
doc-file-source). It is in the relevant part far more simple with no 
language-code-switch at all. 
Extract of the very same text-part from the rtf-file generated from 
Microsoft Word-XP (simple code):
===========================================
German Umlaute: \'e4
\'f6\'fc \'c4\'d6\'dc 
\par sz = \'df 
\par French letters: \'abDeux caf\'e9 s\lquote il vour pla\'eet!\'bb 
dit Fran\'e7oir. 
\par 
===========================================

See the attached 7z-file. 

Thank you for exploring the bug. 

Schneider


Am 18 Dec 2011 um 15:16 hat bugzilla-daemon@freedesktop.org 
geschrieben:

> https://bugs.freedesktop.org/show_bug.cgi?id=40735
> 
> Jean-Baptiste Faure <jbf.faure@orange.fr> changed:
> 
>            What    |Removed                     |Added
> ----------------------------------------------------------------------------
>              Status|NEW                         |NEEDINFO
>                  CC|                            |jbf.faure@orange.fr
> 
> --- Comment #1 from Jean-Baptiste Faure <jbf.faure@orange.fr> 2011-12-18 07:16:36 PST ---
> No problem for me under Ubuntu 10.04 x86_64 with LO 3.4.4 and LO 3.5.0 beta-1
> 
> @Schneider: please could you try again with current release (3.4.4) and, if
> possible, with LO 3.5.0 beta-1 ?
> 
> Best regards. JBF
> 
> -- 
> Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You reported the bug.


******************************************
Schneider, Uni-Dortmund, Fak 15, AV-Labor
E-Mail: HIAT@post.Uni-Dortmund.de
******************************************
Comment 3 Jean-Baptiste Faure 2011-12-18 21:55:08 UTC
Sorry, I did not understand that it was when you opened the RTF file, produced by LO, in MS-Word, that you had a problem. The RTF file is OK in LO, will try with MS-Word when I will have found one.

Side note: your file opened in Abiword 2.8.2 looks like in LO.

Cedric: please have a look at this bug, it may contain interesting informations for the new RTF filter. Feel free to reassign if you want.

Best regards. JBF
Comment 4 s-joyemusequna 2012-01-13 07:56:45 UTC
Created attachment 55555 [details]
LibO file opened with Word 2007

I confirm the error. The RTF output is erroneous and complicated.
Comment 5 s-joyemusequna 2012-01-13 08:05:04 UTC
Sorry, I forgot to mention: tested with LibO 3.4.4.

LibO 3.4 Beta2 seems fine.
Comment 6 s-joyemusequna 2012-01-13 08:06:08 UTC
I correct: LibO 3.5 Beta2 seem fine.
Comment 7 s-joyemusequna 2012-01-13 08:19:30 UTC
I am very sorry, bad day! I loaded by mistake the ODT file, not the RTF file.

The error is in both LibO 3.4.4 and LibO 3.5 Beta2. I looked at the RTF code too, to be sure.
Comment 8 s-joyemusequna 2012-01-14 01:46:24 UTC
Created attachment 55565 [details]
manually corrected files which were attached here

Hello,

perhaps I can give a hint or even a solution, but I am in no way an expert so please look at it and compare my code with the original code.

I manually corrected the files which were attached here (in ODT-RTF-Konverter-Test*.7z) and tried them with Word 2007, LibO 3.4.4 and LibO 3.5 Beta2. They display correctly.

The problems seems to be:

1) Style is defined in the style sheet, p.ex. \s0. Then it is used and basically the whole definition is repeated for each paragraph. I don't know if it is for some compatibility reason, but simply \s0 should be OK. If all paragraphs in the text are the same standard paragraphs, it can even be omitted. I don't know what causes the problem, because as I said, the style is apparently redefined with the same values.

2) Possibly erroneous use of \loch and \hich. In the second file, I simply lost track of things and used unicode if necessary:

{\f2\fs24 French letters: \'abDeux caf\'e9 s\u8216 ?il vour pla\'eet!\'bb dit Fran\'e7oir. }

The original code which uses \loch and \hich is rather horrible.

I think even Word 97 can interpret unicode encoding, is it really necessary for compatibility reason not to use unicode? The source is much more readable.

I hope I am more helpful today than I was yesterday.

Thanks.
Comment 9 s-joyemusequna 2012-01-17 08:16:21 UTC
Created attachment 55680 [details]
5 very simple rtf files which demonstrate the umlaut formatting problem and the solution (hand-coded by myself)

1. exampJapWord2003RTFSpecError.rtf (file from MS Word 2003 RTF Specification, page 140, with added umlauts to demonstrate the problem)
2. exampJapWord2003RTFSpecOK.rtf (file corrected to work with umlauts)
3. exampleRtfError.rtf (problem: umlauts and greek text not properly formatted)
4. exampleRtfOK.rtf (solution with \hich \loch)
5. exampleRtfOKCoded.rtf (solution with \hich only)


Solution 1:
{\loch\f0 umlauts and greek: }
{\hich\f0 \'e4 \'fc \'f6 \'df \u917 ?\u965 ?\u967 ?\u945 ?\u961 ?\u953 ?\u963 ?\u964 ?\u974 ?}
{\loch\f2 - Arial, no umlauts}

Characters in the range 0-127 have to be formatted with \loch, characters >127 with \hich (range 128-255 can be encoded in hex or unicode, characters >255 have to be encoded in unicode). 

Solution 2:
{\hich <the whole text, even the text area from 0-127, that is even blanks p.ex, must be coded in hex or unicode>} 

Drawback: the text (Basis Latin) isn't human-readable anymore.

Both solutions successfully tested with LibO 3.4.4, LibO 3.5 Beta3, MS Word 2003, and MS Word 2007 on Windows XP and Vista 64.

The MS Word 2003 RTF Specification (=RTF 1.8)(Associated Character Properties) is rather vague, but it seems that \loch works only in the range 0-127, and \hich works in the region >127. The documentation says that \hich works only from 128-255, but it seems to work with unicode too). RTF 1.5, RTF 1.7, RTF 1.9 : basically the same text.

Solution 3: don't use any associated character properties if not necessary and produce much simpler RTF code, otherwise use solution 1 - would be probably complicated to code.
Comment 10 sasha.libreoffice 2012-02-02 07:10:37 UTC
@ Rainer
My be this bug is locale specific. Please, try reproduce this bug.
Steps to reproduce:
1. Open odt file from initial attachment
2. save as rtf
3. open rtf by MS office and verify umlauts

Thanks
Comment 11 Urmas 2012-03-01 07:17:49 UTC
\loch and \hich are needed for hieroglyphic support.

Also the bug is present in 3.5.0.
Comment 12 Rainer Bielefeld Retired 2012-03-01 08:29:37 UTC
That's a lot of stuff.

[Reproducible] with reporter's first sample "ODT-RFT-Konverter-Test.odt" and "LibreOffice Daly based on 3.4.2 RC - WIN7  Home Premium (64bit) German UI [OOO340m1 (Build:201) from libreoffice-3-4~2011-07-22_15.35.00_LibO_3.4.2rc1_Win_x86_install_multi.exe]"  from 2011-07-23
[Reproducible] with "LibreOffice 3.5.1.1 German UI/Locale [Build-ID: 45a2874-aa8c38d-dff3b9c-def3dbd-62463c8] on German WIN7 Home Premium (64bit) 

But I CAN NOT  reproduce with own texts, even when I copy / Paste special as plain text contents from "ODT-RFT-Konverter-Test.odt" to a new WRITER document in 3.4.2 or 3.5.1RC everything looks fine in export.rtf.

@s-joyemusequna@vf.uni-konstanz.de, @Urmas:
I agree with sasha's thoughts, I prefer to find out why this only happens only under particular circumstances before we discuss a solution. Do you have any idea why the problem is reproducible for me with reporter's sample, but not with self typed texts?

@Miklós
I believe you might be the more appropriate expert for this problem. Do you already have an idea what the reasons might be or do you need additional research, may be start from the roots, try a parallel server installation of 3.4.5 with it's own user profile (<https://wiki.documentfoundation.org/Installing_in_parallel>), ...
Comment 13 s-joyemusequna 2012-03-01 11:25:21 UTC
@Rainer Bielefeld

It does not happen only under particular circumstances. It happens consistently in all cases that I tried, with all self-typed texts (tested With Vista 64, LibO 3.4.5).

But: the problem is only visible, if you open the rtf file with MS Word (tested with Word 2003 and Word 2007).

It opens fine with all versions of LibreOffice, and with AbiWord 2.9.2. With WordPad it looks OK (but it is not - if you mark all the text supposed to be Arial and format it with Arial, you see the same error as with Word).

The RTF code simply is not OK, but LibreOffice and AbiWord accept it as correct.
Comment 14 Rainer Bielefeld Retired 2012-03-01 13:24:36 UTC
(In reply to comment #13)
> It does not happen only under particular circumstances. 

Your settings, profile and and and also are a "particular circumstance". If I find the time I will try with WIN XP tomorrow.

> But: the problem is only visible, if you open the rtf file with MS Word (tested
> with Word 2003 and Word 2007).

I should have mentioned: due to the comments, of course I checked all exported.rtf with MS WORD Viewer. 

> It opens fine with all versions of LibreOffice, and with AbiWord 2.9.2. With
> WordPad it looks OK (but it is not - if you mark all the text supposed to be
> Arial and format it with Arial, you see the same error as with Word).

That is is a different problem. Of course there might be common roots, but our research here please should be limited to the FILESAVE problem.

For the FILEOPEN problem (what I also can confirm) IMHO a separate bug should be submitted.

> The RTF code simply is not OK, but LibreOffice and AbiWord accept it as
> correct.

As mentioned, that is not a FILESAVE problem and deserves a separate Bug.
Comment 15 Rainer Bielefeld Retired 2012-03-02 00:13:45 UTC
Created attachment 57896 [details]
Test kit

Results in my testkit when open documents wiht MS WORD Viewer

a) reported problem remains visible when I copy / paste special as plain text
   to reporter's sample document and save as .rtf

b) no problem visible when I  copy / paste special as plain text
   to a new document and save as .rtf

c) Problem also visible when save original document as WORD6

d) no problem visible when save original document as WORD97

May be someone with better knowledge than mine can find out the differences between "originaldocumentcontentspasteasplaintext.odt" and "newdocumentcontentspasteasplaintext.odt" causing the different rtf export?

I Submitted "Bug 46864 - FILEOPEN particular .RTF does not show different character styles in a line".
Comment 16 Rainer Bielefeld Retired 2012-03-02 00:18:14 UTC
The FILESAVE problem is not visible with OOo 3.3., my first observation is  with reporter's sample and "LibreOffice Portable 3.3.0  - WIN7  Home Premium (64bit) German UI [OOO330m19 (Build:6) tag libreoffice-3.3.0.4]".

The different date of appearance (compared to Bug 46864) seems to underpin my suspect that those 2 bugs are different.

REGRESSION because worked in last OOo version before LibO started.
Comment 17 Not Assigned 2012-03-30 05:36:07 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8836b45de536a3a2fd72533c3210e439bc2fbca1

fdo#40735 RTF export: CJK text is typically not single-byte
Comment 18 Miklos Vajna 2012-03-30 05:41:52 UTC
s-joyemusequna,

Repeating the contents of styles indeed makes the RTF output ugly, but that's needed. Dropping support for readers not supporting styles would be a regression.

Schneider,

To my understanding, RTF uses the \hich, \loch and \dbch control words to handle legacy non-unicode and non-ASCII text. The orginal bugdoc had German accents, so the \hich part was applied, even if the German text was obviously not CJK text. The above oneliner fix changes the CJK text to use \dbch as well. Now I see the correct font name in Word as well.

Miklos
Comment 19 I.I. 2014-07-21 12:18:48 UTC
Created attachment 103187 [details]
ODT-RFT-Konverter-Test.odt converted to RTF with LibreOffice 4.2.5
Comment 20 I.I. 2014-07-21 12:22:27 UTC
Created attachment 103188 [details]
ODT-RFT-Konverter-Test-created-with-LibreOffice-4.2.5.odt and .rtf
Comment 21 I.I. 2014-07-21 12:34:27 UTC
I converted ODT-RFT-Konverter-Test.odt attached in "Example-text as odt and rtf" to RTF with LibreOffice 4.2.5, and opened the RTF document with Microsoft Word 2010. "Times New Roman" instead of "Arial" was used for the second occurrence of "äöüß" and "ÄÖÜ". (See attachment "ODT-RFT-Konverter-Test.odt converted to RTF with LibreOffice 4.2.5")

Then I created a similar document with LibreOffice 4.2.5, converted it to RTF, and opened the RTF document with Microsoft Word 2010. "Liberation Serif" instead of "Times New Roman" was used for the first occurrence of "äöüß" and "ÄÖÜ", and "Liberation Serif" instead of "Arial" was used for the second occurrence of "äöüß" and "ÄÖÜ". (See attachment "ODT-RFT-Konverter-Test-created-with-LibreOffice-4.2.5.odt and .rtf")
Comment 22 Robinson Tryon (qubit) 2014-09-05 07:51:26 UTC
Removing comma from whiteboard (please use a space to delimit values in this field)
https://wiki.documentfoundation.org/QA/Bugzilla/Fields/Whiteboard#Getting_Started
Comment 23 Miklos Vajna 2014-10-25 10:32:09 UTC
Hi Igor,

This bug was fixed more than two years ago. If you found a similar issue, please open a new bug for your problem, don't reopen an ancient one.

Thanks!
Comment 24 Robinson Tryon (qubit) 2015-12-15 11:53:51 UTC
Migrating Whiteboard tags to Keywords: ( rtf_filter -> filter:rtf)
[NinjaEdit]