Created attachment 59541 [details] The Slovenian odt file for testing purposes Problem description: Steps to reproduce: 1. Install LibreOffice, preferably with Slovenian spell-checker 2. Open attached document 3. "Save as ..." the document as a rtf 4. Close saved rtf document 5. Open saved rtf document 6. Make a minor change (i.e. add a space or a newline at the end), so saving becomes possible, and force a save of the document. 7. Close the rtf document 8. Open rtf document Observe how characters following the č character have dissappeared (attached are screenshots of original odt text and the resulting rtf in step 8). If this is confirmed on other systems, it should be a stopper for all Slavic languages. Current behavior: character after č gets lost, in 3.5.2 even formatting gets erratic Expected behavior: document text should remain the same as in odt Platform (if different from the browser): Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0
Created attachment 59542 [details] Screenshot - original odt text displayed
Created attachment 59543 [details] Screenshot - buggy rtf text displayed using 3.5.1
Created attachment 59544 [details] Screenshot - even buggier rtf text displayed using 3.5.2
Tried the same on a Windows XP Professional 32-bit SP3 with LibreOffice 3.5.0 and something else happened. The characters did not dissapear in steps 7-8, but already in step 5 the "č" character was replaced or displayed as "è", which again proves that something is wrong interpreting special characters "č" and "Č". Will attach a screenshot. Since this erratic behavior is now confirmed on different OS and with different versions of LO 3.5.x, I will change status to NEW. The title might be a bit misleading, maybe it should be changed to "RTF export filter misinterprets characters č and Č" or something like that? This is a serious bug for all Slovenian users and I as the lead of Slovenian localization of LibreOffice will have to issue a warning to all Slovenian users using 3.5.x.
Created attachment 59580 [details] Screenshot - buggy rtf text displayed on Win using 3.5.0
Since this is a critical error for Slovenian language users (and latin Slavic users, I guess) I am raising the importance to "critical".
Confirmed and unfortunately, it's not only č. Actually saving any accented character is pretty much screwed up completely. Note that first time save is correct, it's only after opening the rtf a second time and resaving (no need to edit it, a 'Save as' is enough). I'm attaching a test document created with Wordpad (a document created with Writer behaves identically, but the created rtf is extremely messy and unreadable - just the size difference is telling - Wordpad 400B / Writer 4.6kB) that exhibits this problem. The used test letters are ž š č ř ď ť ň ě á é í ó ú ů and all do disappear after a resave. I guess this should be a blocker, because it causes a severe data loss.
Created attachment 59585 [details] Test file, original (Czech language) Test file with Czech accented letters, use Save as to reproduce the problem.
Created attachment 59586 [details] Test file, after a resave (Czech language) Above test file after a Save as. All accented letters disappeared.
Probably not the problem as changing all manually by directly editing the rtf source didn't fix it, but the combination of \langfe2052 (Chinese), \alang1081 (Hindi) and \lang1029 (Czech) in the styles and \adeflang1025 (Arabic) as default document language for a document using Czech only is rather weird.
After some more testing it looks like the problem with the test file I posted might be a bit different than the problem from the original reporter, but I will leave it in this bug for now, feel free to split it in new bug if it really turns out to be a different matter.
So the problem is on the first save, the old pre Word 97 format of RTF is used, where č is stored as \'e8 (hexadecimal). On the second save, Writer is trying to use the new way of representing out of ANSI characters using the Unicode notation \uN and fails miserably as č is stored as \u269 (correct) followed by \'0d (guess that was meant to be \'10d?), which is carriage return.
Seems to be fixed in LO 3.5.3. Please have a look at the files saved with current LO 3.5.3 rc0+ (LibreOffice 3.5.3rc0+ Version ID : 51c8c95-a73d29c-6845e52-f269e46-31eca31). Best regards. JBF
Created attachment 60002 [details] bugdoc saved as RTF by LO 3.5.3rc0+
Created attachment 60003 [details] second bugdoc resaved in RTF by LO 3.5.3 rc0+
My tests have been done on Ubuntu 11.10 but reporter uses MacOS, so, please, try LO 3.5.3 rc0+ on MacOS. You can find a recent daily build of LO 3.5.3 rc0+ for MacOS here : http://dev-builds.libreoffice.org/daily/MacOSX-Intel@3-OSX_10.6.0-gcc_4.0.1/libreoffice-3-5/current/ Best regards. JBF
No, still not fixed, the only difference is that compared to 3.5.2, you need to do one additional save to make this manifest in 3.5.3. When looking at the RTF source the corruption is still the same.
Forgot to add I tested this on Windows 7, using the 3.5.3 daily from 14.4.
(In reply to comment #17) > No, still not fixed, the only difference is that compared to 3.5.2, you need to > do one additional save to make this manifest in 3.5.3. When looking at the RTF > source the corruption is still the same. Hmmm, you are right :-( Miklos: please have a look. Feel free to reassign if you can't handle this bug. Best regards. JBF
[Reproducible] with "LibreOffice 3.5.2.2 German UI/Locale [Build-ID: 281b639-6baa1d3-ef66a77-d866f25-f36d45f] on German WIN7 Home Premium (64bit) Still in 3.6 Master. Works fine with "LibreOffice 3.4.5 German UI [Build ID: OOO340m1 (Build:502)]" parallel Server installation on German WIN7 Home Premium (64bit), so indeed REGRESSION I am pretty sure that I already saw a similar bug here in Bugzilla (not necessary related to rtf?), but I can't find it.
I can reproduce this one on master. If you cut down the original test doc to 'Maček', then after rtf-export, rtf-import, the result is ok, but the second rtf-export, rtf-import renders it as "Ma\nčk". I'll look into this one.
Miklos Vajna committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=69259c6509809c1064eb05690dcd9c19c840bae1 fdo#48356 fix RTF import of special unicode characters
Why is target set to 3.6.0? For Eastern European users this mean that whole 3.5.x branch is not recommendable and that they have to wait for 3.6.0 in July or even further so they wait that 3.6.x is stabilized (for usage in government and public service). Could this be made into 3.5.4 or something? Thanks for understanding - and fixing :) - the issue at hand.
Sure, I'll request a cherry-pick to -3-5 in a bit. But the process is to fix stuff in master (when the 3.6 target is added), then earlier targets are added optionally as well. ;-) Marking as resolved in the meantime.
(In reply to comment #23) > Could this be made into 3.5.4 or something? Thanks for understanding - and > fixing :) - the issue at hand. Fear not, it will land in 3.5.3 for sure. Just we need to wait for 3 reviews in this phase.
Miklos Vajna committed a patch related to this issue. It has been pushed to "libreoffice-3-5": http://cgit.freedesktop.org/libreoffice/core/commit/?id=299387dab1b365427cc44d810026facd30e11a31&g=libreoffice-3-5 fdo#48356 fix RTF import of special unicode characters It will be available in LibreOffice 3.5.4.
Miklos Vajna committed a patch related to this issue. It has been pushed to "libreoffice-3-5-3": http://cgit.freedesktop.org/libreoffice/core/commit/?id=8b8d2680ca96254c606c4be023b3f0e8caacae9b&g=libreoffice-3-5-3 fdo#48356 fix RTF import of special unicode characters It will be available already in LibreOffice 3.5.3.
*** Bug 49269 has been marked as a duplicate of this bug. ***
Verified with LOdev 3.6 (master - 18-May-2012 02h44 x86@6-fast; Build ID: 8b1d29b) under Windows Vista 64.
Migrating Whiteboard tags to Keywords: (filter:rtf) Replace rtf_filter -> filter:rtf. [NinjaEdit]