Description: When a simple file is saved AS RTF, the (Roman-style) font in use is saved in the fonttbl with \fcharset128 (= Shift Jis). If a character is then changed in the RTF to an accented form, this then displays as a fr-eastern character instead of a (European) unicode. The normal setting should be \fcharset1 (= Default). Steps to Reproduce: 1.Create a new document in Writer 2.Save as RTF 3. Actual Results: \fcharset128 set for the font in use (Book Antiqua) Expected Results: \fcharset1 should have been set for the font in use Reproducible: Always User Profile Reset: Yes OpenGL enabled: Yes Additional Info: Version: 7.3.5.2 / LibreOffice Community Build ID: 30(Build:2) CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3 Locale: en-GB (en_GB.UTF-8); UI: en-GB Ubuntu package version: 1:7.3.5~rc2-0ubuntu0.20.04.1~lo1 Calc: threaded
Please share a simple file (in ODT format) which, when exported to RTF, shows the problematic behavior. Also provide specific steps needed to repro the "If a character is then changed in the RTF ..." part, like what to edit using which tool (in Writer? in a plain text editor? which character to change to what), and the comparison of expected vs. actual results (preferably a screenshot with a marked difference). Thanks!
Created attachment 181530 [details] A simple test case, with embedded 'Book Antiqua' font Open, Save AS Rich text (rtf) Using a text editor (I use vi), change the "o" to ö (o-umlaut), and save open the changed RTF with LibreOffice NB: this may give different results on your system if BookAntiqua is not installed, but the result on my system is shown on the next attachment (notnormaltext.png) The RTF-save is also given \fcharset128 if I use another Palatino font, TeX-Gyre-Pagella (the situation is then complicated however, because Pagella is not recognized as \fromman, but is assigned \fnil - but that's another issue ...)
Created attachment 181531 [details] screenshot showing tesult of the change of 'o' to o-umlaut Of course, if at the time of editing the RTF, \fcharset is also changed, to \fcharset1 (the proper default), then all is ok again. But the issue remains, that \fcharset128 results in an unstable RTF file.
No repro using Version: 7.4.0.2 (x64) / LibreOffice Community Build ID: 1512ce97d7ed39dce3121f7e15651fd8895f950e CPU threads: 12; OS: Windows 10.0 Build 19044; UI render: default; VCL: win Locale: en-US (ru_RU); UI: en-US Calc: CL Could be system- or locale-specific issue, or maybe system encoding-dependent... Also wanted to mention, that there is a large thread in a Russian forum, dedicated to RTF corrupt like described here [1] - the fix is described there in answer #17, but there it seems to be specific to AOO (one related to LO was about version 3.4.4). [1] https://forumooo.ru/index.php/topic,6952.15.html
Created attachment 181532 [details] My first attempt didnt seem to have embedded the font - this one does
(In reply to Bernard Moreton from comment #5) No repro on my system, either - FTR, I have Book Antiqua installed locally.
Re Comment #4: the reference to the Russian thread seems more related to RTF corruption than to LO/AOO in particular. I'm just concerned that LO save as RTF is generating an unstable an unstable file by mis-application of \fcharset128. My own use is rather the other way around: I write database reports to RTF, have LO display them, and 'Send As 'PDF where wanted. I was only reminded of this bad \fcharset when analysing LO's output in order to get a (for me) unusual section structure right in my own coding, and found this peculiar character change, which it took me some time to understand. And I'm glad that someone else likes Palatino/Book Antiqua/T-G-Pagella!
[Automated Action] NeedInfo-To-Unconfirmed
Re comments 1,2,4: editing the RTF to change an 'o' to an 'ö' (o-unlaut) should be done by copy-and-paste of the o-umlaut over the 'o', or by direct keying (on my Ubuntu, Magic+o," - my Magic key in AltGr), not by entering \u246\'3f . Just thought I'd better make sure! For the font BookAntiqua, to accord with the RTF-Specification, the correct font entry should include \fcharset1, NOT \fcharset128. The PRQ entry should probably be \fprq2, like the other Roman entries - but at least it hasn't been set to \fprq1, which would be positively wrong. It looks (from LO Git) as though LO uses internal XML font tables, probably inherited unchanged from OOo, rather than interrogating the font on the system. I have looked for such a table on my system, but haven't found one, so I assume that it's compiled-in? If I'm right, then to make LO consistent with the RTF Specification, the entry for Book Antiqua should be updated. The current font entry in RTF export does not just produce an unstable file, but does so because it does not conform to the specification. (The use of an internal table would also explain why TeX-Gyre_Pagella is given such a wrong entry in the fonttbl - I can't find any mention of Pagella on LO Git.) If that (hypothesised) LO internal font table is on the compiled system and I simply haven't found it, then please point me to the right location, and I'll happily try editing it on my system.
Bernhard, a new version is available. Could you please retest with LO 7.6? Is the bug still present? => NEEDINFO
There has been a change - the testcase file (normaltext.odt) now saves with \fcharset0 instead of \fcharset128 as before. After changing the 'o' in the saved rtf to an umlauted form, the file then displays in Writer with 2 high-ascii(?) characters, not as 'ö'. Changing \fcharset0 to \fcharset1 resolves the problem - the umlaut then shows correctly. So the change has not resolved the problem. Western UTF should save as \fcharset1.
I confirm the problem: Steps: 1. Open a new document and type "Hallo" 2. Save as rtf 3. Open rtf file in an editor (I've used Windows Editor) 4. change "Hallo" to "Hällo" 5. Save and reopen in Writer Actual result: "Hテ、llo" Expected result: "Hallo" But I can't follow your solution, but I also don't have deeper knowledge of editing rtf-file
(In reply to Dieter from comment #13) > 3. Open rtf file in an editor (I've used Windows Editor) > 4. change "Hallo" to "Hällo" Why do you think it's OK to edit the RTF markup like this? The resulting RTF would be an invalid one, because RTF is ASCII (7-bit), and all Unicode is encoded there in a special way, not by using the Unicode characters directly.
Strictly speaking, Mike (#14) is right, and RTF is designed as ANSI-compliant. But the RTF manual lists \fcharset1 as "default", even though \fcharset0 is listed as "ansi". As to "why" - some of us use RTF as an intermediate tool, even though that be non-compliant, and I don't think any RTF-reader actually cares. It can be convenient, for example, to use something like 'sed' to effect a quick global change - and the "official" \ucN ... \uN is just too cumbersome. So: \fcharset0 is unnecessarily restrictive. The "default" \fcharset1 should be used for all western text, whether UTF or not. It allows liberty, where the current setting enforces antiquated restriction.
(In reply to Bernard Moreton from comment #15) The charset of such an RTF would be unknown. Do you encode your "ö" using UTF-8, or using Win-1252? This "extension" is not a proper thing. If the *original* problem is not reproducible anymore (I would love to have reliable steps to repro, because I believe that was a really important problem; unfortunately, I couldn't repro myself), then this is WORKSFORME. The problem mentioned in comment 11 (if that is the same as described in comment 13 - I wasn't so sure; the "2 high-ascii(?) characters" imply so, meaning that the encoding was likely UTF-8, which is e.g. still completely uncommon as system encoding on Windows) would be WONTFIX.
Concluding. Comment 11 means WORKSFORME. (Please do re-open, if the original problem is reproducible - please also provide a reproducible scenario in that case.) Using \fcharset1 means (in the absence of \cpgN) that respective text runs use "system encoding". This means, that *any* octet with value >127 has *system-specific* value. Thus, such characters would be imported differently on different systems; and - given that Bernard Moreton obviously uses Linux, with the usual (but not 100% used) UTF-8 system encoding, their "ö" would be UTF-8-encoded in the RTF. Such an RTF will open wrong on any Windows system (unless that system would use *still experimental* UTF-8 system encoding support - i.e., ~0% of Windows systems uses that). The idea that '"default" \fcharset1 should be used for all western text' is not only wrong (meaning that all non-ASCII "western" characters would break on most systems randomly), it is also Western-centric way of thinking.