Created attachment 118397 [details] ODT document with a comment In a specific situation, non-ASCII characters in comments are corrupt when an RTF document is reopened. This only seems to occur on Linux after an RTF document has been modified and then saved. If there are non-ASCII characters in the actual content of the document, these are not affected. Found with LibO 4.3.3.2 (from Debian Jessie repository), 4.4.5.2 (from Debian backports) and 5.0.1.2 (downloaded directly from the LibO site). The issue does not emerge with any version of LibO on Windows 7. Perhaps there is something that goes wrong with UTF-8 (UTF-16 is used internally in Windows). (I actually found the bug a few months ago, but I didn't report it back then, because it already seemed fixed in LibO 5.0.0.0.beta1. However, it now affects 5.0.0.5 as well on Linux.) Steps to reproduce: 1. Open the attached ODT document. There is some text (in Finnish) that is repeated in a comment. 2. Save the document as RTF. (You can at this point close the document and reopen it, and everything seems fine.) 3. Make a change in the document, save it again as RTF and close it. 4. When the RTF is now reopened, the non-ASCII characters in the comment are corrupt. They have mostly been turned into question marks, but typographical quotation marks have been turned into a combination of a square and a letter.
Hi Simo, Tried it in 5.0 daily and master and it worked fine. Try giving 5.0.1 or 5.0.2 a try and see if it still shows up for you. Version: 5.0.3.0.0+ Build ID: 4ae70fd6c93087ce66c76d3102ad678bcf01dbf5 TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:libreoffice-5-0, Time: 2015-09-18_11:42:55 Locale: en-US (en_US.UTF-8) Version: 5.1.0.0.alpha1+ Build ID: cbf3fac0a5a1be34b2e1a58da959debd24ebc017 TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2015-09-17_07:03:22 Locale: en-US (en_US.UTF-8)
Hi Jay, LibO 5.0.2.2 is also affected, but the bug does seem fixed in the development version. However, as I said above, it already seemed fixed in 5.0.0.0.beta1, but was back in 5.0.0.5. Therefore, I think I'll keep a close eye on this when 5.0.3.1 is made available. AFFECTED: Version: 5.0.2.2 Build ID: 37b43f919e4de5eeaca9b9755ed688758a8251fe Locale: fi-FI (fi_FI.utf8) NOT AFFECTED: Version: 5.0.3.0.0+ Build ID: a9670e0735b77ecc40aa8af4106af7d32ec548a0 TinderBox: Linux-rpm_deb-x86@45-TDF, Branch:libreoffice-5-0, Time: 2015-09-24_23:24:38 Locale: fi-FI (fi_FI.utf8)
Reopening. So, the issue seemed fixed in 5.0.3.0, but now it has re-emerged in 5.0.3.1. As I already saw the same regression occur between 5.0.0.0.beta1 and 5.0.0.5, it seems something goes wrong when the beta version is turned into a release candidate. Version: 5.0.3.1 Build ID: fd8cfc22f7f58033351fcb8a83b92acbadb0749e Locale: fi-FI (fi_FI.utf8)
*** Bug 90128 has been marked as a duplicate of this bug. ***
has anyone tried to bibisect this latest 5.0.x regressions?
Works fine for me. Version: 5.1.0.0.alpha1+ Build ID: b684090d4f573eb339e93872d0cef07e69adc913 TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2015-10-16_01:50:06 Locale: en-US (en_US.UTF-8) Version: 5.0.4.0.0+ Build ID: 9a75c72495ed6014d6c84fdead14bef68ea32858 TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:libreoffice-5-0, Time: 2015-10-16_08:36:43 Locale: en-US (en_US.UTF-8) (In reply to tommy27 from comment #5) > has anyone tried to bibisect this latest 5.0.x regressions? I doubt there is any regression there. (In reply to Simo Kaupinmäki from comment #3) > So, the issue seemed fixed in 5.0.3.0, but now it has re-emerged in 5.0.3.1. > As I already saw the same regression occur between 5.0.0.0.beta1 and > 5.0.0.5, it seems something goes wrong when the beta version is turned into > a release candidate. Works fine for me. Could it be specific to Finnish, as i've been typing english words into the document before saving it to RTF? Can you please provide a sample corrupt RTF, so we can test if it is corrupt when we open it? Version: 5.0.3.1 Build ID: fd8cfc22f7f58033351fcb8a83b92acbadb0749e Locale: en-US (en_US.UTF-8)
Created attachment 119752 [details] RTF with corrupt characters in the comment There is another corrupt RTF in the duplicate bug 90128. As that file was probably not created in a Finnish environment, I don't think the issue is specific to Finnish. I've just noticed that if I try to open either of the corrupt files with XFCE's Mousepad, it complains about incorrect encoding: "The document was not UTF-8 valid" "Invalid byte sequence in conversion input." LibO opens the file without complaints, but the non-ASCII characters in the comment are corrupt.
(In reply to Yousuf (Jay) Philips from comment #6) > Works fine for me. Could it be specific to Finnish, as i've been typing > english words into the document before saving it to RTF? Please take in to account that the issue does not emerge if you just save a document as RTF. It only emerges after you have re-saved an existing RTF and then close and re-open it.
Created attachment 119769 [details] modified rtf
(In reply to Simo Kaupinmäki from comment #8) > Please take in to account that the issue does not emerge if you just save a > document as RTF. It only emerges after you have re-saved an existing RTF and > then close and re-open it. In my last comment, i've attached an rtf that i created after following the steps in the bug description, so yes i am aware of the issue that you have addressed in the bug report, but still cant reproduce it. Version: 5.1.0.0.alpha1+ Build ID: b684090d4f573eb339e93872d0cef07e69adc913 TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2015-10-16_01:50:06 Locale: en-US (en_US.UTF-8) @Miklos: Any ideas on what maybe causing this issue?
(In reply to Yousuf (Jay) Philips from comment #10) > Version: 5.1.0.0.alpha1+ > Build ID: b684090d4f573eb339e93872d0cef07e69adc913 > TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: > 2015-10-16_01:50:06 > Locale: en-US (en_US.UTF-8) I don't know if it matters, but it seems you have tested this on x86_64, whereas I have only tested it on IA32. Bug 90128 has also been filed against IA32 architecture.
(In reply to Simo Kaupinmäki from comment #7) > I've just noticed that if I try to open either of the corrupt files with > XFCE's Mousepad, it complains about incorrect encoding: > "The document was not UTF-8 valid" > "Invalid byte sequence in conversion input." Actually, this seems to happen with any RTF file, so it doesn't prove anything.
Created attachment 119794 [details] RTF created with MS Word For comparison, I have opened the original ODT file in MS Word 2013 and saved it as RTF. Notice that the file size is considerably larger than with an RTF created by Writer. Furthermore, if this file is opened with Mousepad, it does not complain about encoding. So there seems to be something not-quite-right even with the file posted by Jay, although it's not directly visible.
Tried 32-bit version and still couldnt confirm. Version: 5.1.0.0.alpha1+ Build ID: 2b5a48da5969b1ed37f4480d843714d434feb5d9 TinderBox: Linux-rpm_deb-x86@71-TDF, Branch:master, Time: 2015-10-19_05:39:28 Locale: en-US (en_US.UTF-8)
(In reply to Yousuf (Jay) Philips from comment #14) > Tried 32-bit version and still couldnt confirm. > > Version: 5.1.0.0.alpha1+ Well, that's an alpha version. As I have said above, twice already I have been unable to reproduce this bug in a beta version, but then it has re-emerged in a release candidate.
(In reply to Simo Kaupinmäki from comment #15) > Well, that's an alpha version. As I have said above, twice already I have > been unable to reproduce this bug in a beta version, but then it has > re-emerged in a release candidate. Just tested with this 32-bit release candidate and still no luck. Version: 5.0.3.1 Build ID: fd8cfc22f7f58033351fcb8a83b92acbadb0749e Locale: en-US (en_US.UTF-8) http://downloadarchive.documentfoundation.org/libreoffice/old/5.0.3.1/deb/x86/
Created attachment 119796 [details] Screenshots from Word Well, isn't this getting frustrating or what? I've done some further comparison with MS Word and found that although the RTF posted by Jay looks fine in Writer, it does not look quite right in Word. It's not as bad as first RTF by me, but the umlauted letter "ä" in my name has been replaced with what looks like a Chinese character. The actual comment looks OK, except that the font has been changed from the original DejaVu Serif into Liberation Serif (I assume this isn't deliberate). Furthermore, the font of the non-ASCII characters in the actual document text have also been changed into Liberation Serif, whereas the font of the ASCII characters is still DejaVu Serif. This attachment is a combination of three screenshots taken of various RTFs as they appear in Word. The first one is attachment 119794 [details] (created by me in Word) and looks fine, the second one is attachment 119752 [details] (created by me in Writer) and looks all wrong, and the third one is attachment 119769 [details] (created by Jay in Writer) and looks, well, not quite right.
I've only now realized that in order to reproduce this bug, you may actually need to close and re-open the RTF before re-saving it. Alternatively, you can select "File > Save As... RTF" twice. The issue does not emerge if you first select "File > Save As... RTF" and then just save changes by pressing Ctrl+S or clicking the save icon (at least if you don't have the "Ask when not saving in ODF or default format" option selected).
Migrating Whiteboard tags to Keywords: (filter:rtf ) [NinjaEdit]
In 5.1.1.3, the main issue seems to have been fixed both in the official LibO version and the Debian backports version. As I don't know what it is exactly that has made the problem go away, I'm closing this bug as WORKSFORME. Thank you for your help in trying to track it down. All is not fine, though. The "ä" in my name is still being replaced by a question mark. It seems that there is a similar bug that specifically affects the non-ASCII characters included in the metadata of a comment. However, this appears to be a separate issue, and it's something I can live with. Version: 5.1.1.3 Build ID: 89f508ef3ecebd2cfb8e1def0f0ba9a803b88a6d CPU Threads: 1; OS Version: Linux 3.16; UI Render: default; Locale: fi-FI (fi_FI.utf8) Version: 5.1.1.3 Build ID: 1:5.1.1-1~bpo8+1 CPU Threads: 1; OS Version: Linux 3.16; UI Render: default; Locale: fi-FI (fi_FI.utf8)
For the record, 5.0.5.2 is still affected by the main issue (and also by the secondary issue regarding metadata): Version: 5.0.5.2 Build ID: 55b006a02d247b5f7215fc6ea0fde844b30035b3 Locale: fi-FI (fi_FI.utf8) I haven't tested with 5.1.0, so it's possible that the main issue has already been fixed there.