Bug 48356 - FILESAVE: Save as RTF loses characters before/after special Eastern European characters
Summary: FILESAVE: Save as RTF loses characters before/after special Eastern European ...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.1 release
Hardware: All All
: medium critical
Assignee: Miklos Vajna
URL:
Whiteboard: target:3.6.0 target:3.5.3
Keywords: filter:rtf, regression
: 49269 (view as bug list)
Depends on:
Blocks: mab3.5
  Show dependency treegraph
 
Reported: 2012-04-05 14:09 UTC by Martin Srebotnjak
Modified: 2015-12-17 12:06 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
The Slovenian odt file for testing purposes (9.50 KB, application/vnd.oasis.opendocument.text)
2012-04-05 14:09 UTC, Martin Srebotnjak
Details
Screenshot - original odt text displayed (118.04 KB, image/png)
2012-04-05 14:10 UTC, Martin Srebotnjak
Details
Screenshot - buggy rtf text displayed using 3.5.1 (195.81 KB, image/png)
2012-04-05 14:11 UTC, Martin Srebotnjak
Details
Screenshot - even buggier rtf text displayed using 3.5.2 (127.59 KB, image/png)
2012-04-05 14:12 UTC, Martin Srebotnjak
Details
Screenshot - buggy rtf text displayed on Win using 3.5.0 (31.18 KB, image/png)
2012-04-06 02:32 UTC, Martin Srebotnjak
Details
Test file, original (Czech language) (388 bytes, application/rtf)
2012-04-06 07:22 UTC, khagaroth
Details
Test file, after a resave (Czech language) (4.63 KB, application/rtf)
2012-04-06 07:24 UTC, khagaroth
Details
bugdoc saved as RTF by LO 3.5.3rc0+ (2.58 KB, application/rtf)
2012-04-14 23:06 UTC, Jean-Baptiste Faure
Details
second bugdoc resaved in RTF by LO 3.5.3 rc0+ (4.60 KB, application/rtf)
2012-04-14 23:07 UTC, Jean-Baptiste Faure
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Srebotnjak 2012-04-05 14:09:35 UTC
Created attachment 59541 [details]
The Slovenian odt file for testing purposes

Problem description: 

Steps to reproduce:
1. Install LibreOffice, preferably with Slovenian spell-checker
2. Open attached document
3. "Save as ..." the document as a rtf
4. Close saved rtf document
5. Open saved rtf document
6. Make a minor change (i.e. add a space or a newline at the end), so saving becomes possible, and force a save of the document.
7. Close the rtf document
8. Open rtf document

Observe how characters following the č character have dissappeared (attached are screenshots of original odt text and the resulting rtf in step 8).

If this is confirmed on other systems, it should be a stopper for all Slavic languages.

Current behavior: character after č gets lost, in 3.5.2 even formatting gets erratic

Expected behavior: document text should remain the same as in odt

Platform (if different from the browser): 
              
Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:11.0) Gecko/20100101 Firefox/11.0
Comment 1 Martin Srebotnjak 2012-04-05 14:10:53 UTC
Created attachment 59542 [details]
Screenshot - original odt text displayed
Comment 2 Martin Srebotnjak 2012-04-05 14:11:57 UTC
Created attachment 59543 [details]
Screenshot - buggy rtf text displayed using 3.5.1
Comment 3 Martin Srebotnjak 2012-04-05 14:12:27 UTC
Created attachment 59544 [details]
Screenshot - even buggier rtf text displayed using 3.5.2
Comment 4 Martin Srebotnjak 2012-04-06 02:31:46 UTC
Tried the same on a Windows XP Professional 32-bit SP3 with LibreOffice 3.5.0 and something else happened. The characters did not dissapear in steps 7-8, but already in step 5 the "č" character was replaced or displayed as "è", which again proves that something is wrong interpreting special characters "č" and "Č".
Will attach a screenshot.
Since this erratic behavior is now confirmed on different OS and with different versions of LO 3.5.x, I will change status to NEW. The title might be a bit misleading, maybe it should be changed to "RTF export filter misinterprets characters č and Č" or something like that?
This is a serious bug for all Slovenian users and I as the lead of Slovenian localization of LibreOffice will have to issue a warning to all Slovenian users using 3.5.x.
Comment 5 Martin Srebotnjak 2012-04-06 02:32:49 UTC
Created attachment 59580 [details]
Screenshot - buggy rtf text displayed on Win using 3.5.0
Comment 6 Martin Srebotnjak 2012-04-06 04:33:04 UTC
Since this is a critical error for Slovenian language users (and latin Slavic users, I guess) I am raising the importance to "critical".
Comment 7 khagaroth 2012-04-06 07:19:57 UTC
Confirmed and unfortunately, it's not only č. Actually saving any accented character is pretty much screwed up completely.
Note that first time save is correct, it's only after opening the rtf a second time and resaving (no need to edit it, a 'Save as' is enough).
I'm attaching a test document created with Wordpad (a document created with Writer behaves identically, but the created rtf is extremely messy and unreadable - just the size difference is telling - Wordpad 400B / Writer 4.6kB) that exhibits this problem. The used test letters are ž š č ř ď ť ň ě á é í ó ú ů and all do disappear after a resave.
I guess this should be a blocker, because it causes a severe data loss.
Comment 8 khagaroth 2012-04-06 07:22:36 UTC
Created attachment 59585 [details]
Test file, original (Czech language)

Test file with Czech accented letters, use Save as to reproduce the problem.
Comment 9 khagaroth 2012-04-06 07:24:16 UTC
Created attachment 59586 [details]
Test file, after a resave (Czech language)

Above test file after a Save as. All accented letters disappeared.
Comment 10 khagaroth 2012-04-06 09:27:30 UTC
Probably not the problem as changing all manually by directly editing the rtf source didn't fix it, but the combination of \langfe2052 (Chinese), \alang1081 (Hindi) and \lang1029 (Czech) in the styles and \adeflang1025 (Arabic) as default document language for a document using Czech only is rather weird.
Comment 11 khagaroth 2012-04-07 05:36:58 UTC
After some more testing it looks like the problem with the test file I posted might be a bit different than the problem from the original reporter, but I will leave it in this bug for now, feel free to split it in new bug if it really turns out to be a different matter.
Comment 12 khagaroth 2012-04-07 10:12:37 UTC
So the problem is on the first save, the old pre Word 97 format of RTF is used, where č is stored as \'e8 (hexadecimal). On the second save, Writer is trying to use the new way of representing out of ANSI characters using the Unicode notation \uN and fails miserably as č is stored as \u269 (correct) followed by \'0d (guess that was meant to be \'10d?), which is carriage return.
Comment 13 Jean-Baptiste Faure 2012-04-14 23:05:33 UTC
Seems to be fixed in LO 3.5.3. Please have a look at the files saved with current LO 3.5.3 rc0+ (LibreOffice 3.5.3rc0+ Version ID : 51c8c95-a73d29c-6845e52-f269e46-31eca31).

Best regards. JBF
Comment 14 Jean-Baptiste Faure 2012-04-14 23:06:49 UTC
Created attachment 60002 [details]
bugdoc saved as RTF by LO 3.5.3rc0+
Comment 15 Jean-Baptiste Faure 2012-04-14 23:07:32 UTC
Created attachment 60003 [details]
second bugdoc resaved in RTF by LO 3.5.3 rc0+
Comment 16 Jean-Baptiste Faure 2012-04-14 23:14:06 UTC
My tests have been done on Ubuntu 11.10 but reporter uses MacOS, so, please, try LO 3.5.3 rc0+ on MacOS.
You can find a recent daily build of LO 3.5.3 rc0+ for MacOS here : http://dev-builds.libreoffice.org/daily/MacOSX-Intel@3-OSX_10.6.0-gcc_4.0.1/libreoffice-3-5/current/

Best regards. JBF
Comment 17 khagaroth 2012-04-15 00:33:46 UTC
No, still not fixed, the only difference is that compared to 3.5.2, you need to do one additional save to make this manifest in 3.5.3. When looking at the RTF source the corruption is still the same.
Comment 18 khagaroth 2012-04-15 00:36:40 UTC
Forgot to add I tested this on Windows 7, using the 3.5.3 daily from 14.4.
Comment 19 Jean-Baptiste Faure 2012-04-15 01:26:56 UTC
(In reply to comment #17)
> No, still not fixed, the only difference is that compared to 3.5.2, you need to
> do one additional save to make this manifest in 3.5.3. When looking at the RTF
> source the corruption is still the same.

Hmmm, you are right :-(

Miklos: please have a look. Feel free to reassign if you can't handle this bug.

Best regards. JBF
Comment 20 Rainer Bielefeld Retired 2012-04-19 12:20:50 UTC
[Reproducible] with "LibreOffice 3.5.2.2 German UI/Locale [Build-ID: 281b639-6baa1d3-ef66a77-d866f25-f36d45f] on German WIN7 Home Premium (64bit) 

Still in 3.6 Master. 

Works fine with "LibreOffice 3.4.5 German UI [Build ID: OOO340m1 (Build:502)]" parallel Server installation on German WIN7 Home Premium (64bit), so indeed REGRESSION

I am pretty sure that I already saw a similar bug here in Bugzilla (not necessary related to rtf?), but I can't find it.
Comment 21 Miklos Vajna 2012-04-20 03:21:38 UTC
I can reproduce this one on master. If you cut down the original test doc to 'Maček', then after rtf-export, rtf-import, the result is ok, but the second rtf-export, rtf-import renders it as "Ma\nčk". I'll look into this one.
Comment 22 Not Assigned 2012-04-20 04:02:01 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=69259c6509809c1064eb05690dcd9c19c840bae1

fdo#48356 fix RTF import of special unicode characters
Comment 23 Martin Srebotnjak 2012-04-20 04:16:41 UTC
Why is target set to 3.6.0? For Eastern European users this mean that whole 3.5.x branch is not recommendable and that they have to wait for 3.6.0 in July or even further so they wait that 3.6.x is stabilized (for usage in government and public service).
Could this be made into 3.5.4 or something? Thanks for understanding - and fixing :) - the issue at hand.
Comment 24 Miklos Vajna 2012-04-20 04:22:43 UTC
Sure, I'll request a cherry-pick to -3-5 in a bit. But the process is to fix stuff in master (when the 3.6 target is added), then earlier targets are added optionally as well. ;-)

Marking as resolved in the meantime.
Comment 25 Andras Timar 2012-04-20 06:38:09 UTC
(In reply to comment #23)
> Could this be made into 3.5.4 or something? Thanks for understanding - and
> fixing :) - the issue at hand.

Fear not, it will land in 3.5.3 for sure. Just we need to wait for 3 reviews in this phase.
Comment 26 Not Assigned 2012-04-20 06:50:56 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-3-5":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=299387dab1b365427cc44d810026facd30e11a31&g=libreoffice-3-5

fdo#48356 fix RTF import of special unicode characters


It will be available in LibreOffice 3.5.4.
Comment 27 Not Assigned 2012-04-22 22:35:16 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-3-5-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8b8d2680ca96254c606c4be023b3f0e8caacae9b&g=libreoffice-3-5-3

fdo#48356 fix RTF import of special unicode characters


It will be available already in LibreOffice 3.5.3.
Comment 28 opensuse.lietuviu.kalba 2012-04-29 04:29:56 UTC
*** Bug 49269 has been marked as a duplicate of this bug. ***
Comment 29 s-joyemusequna 2012-05-19 00:08:10 UTC
Verified with LOdev 3.6 (master - 18-May-2012 02h44 x86@6-fast; Build ID: 8b1d29b) under Windows Vista 64.
Comment 30 Robinson Tryon (qubit) 2015-12-17 12:06:00 UTC Comment hidden (obsolete)