Created attachment 67487 [details] Sample docx file with a line with "č" character Problem description: Steps to reproduce: 1. Open the attached docx file. Current behavior: The simple line is split into two at the character "č" and that character is not displayed and is even not present (it treats "č" as a new line symbol although it is a regular character of most Slavic alphabets!) Expected behavior: The line in the imported file should be one, displaying the "č" characted within. Platform (if different from the browser): Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20100101 Firefox/15.0.1
Let me add that this bug is critical for Slovenian and probably all other Eastern European / Slavic languages - all texts from docx files might get garbled like this. I reported a similar bug for 3.5.1, for rtf files: https://www.libreoffice.org/bugzilla/show_bug.cgi?id=48356 At that time it seemed to have been fixed.
Confirmed, will fix in a bit.
Miklos Vajna committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=b3603e0e0e5dbfbeaa2426c499e8f64be2d15765 fdo#55187 fix DOCX import of unicode 0xNN0d when it's a separate run The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Resolved in master, -3-6 review: https://gerrit.libreoffice.org/665
(In reply to comment #3) > Miklos Vajna committed a patch related to this issue. > It has been pushed to "master": Miklós, you are _too_ fast! ;-) I wanted to confirm this bug right now, but while I was still typing in my pedantical description and some bad jokes about Microsoft’s complicated way to store a simple line of text, you have already taken and fixed the issue. Congratulations and thank you very much! Changes for the record/statistics: -- Already reproducible with LibO 3.5.0, therefore adapted Version field (the Version field should always contain the first version in which a bug is known to exist, not the last one). -- Platform should be very probably All/All.
Miklos Vajna committed a patch related to this issue. It has been pushed to "libreoffice-3-6": http://cgit.freedesktop.org/libreoffice/core/commit/?id=9d3af8d699c95b7433591701666a70554d543b96&g=libreoffice-3-6 fdo#55187 fix DOCX import of unicode 0xNN0d when it's a separate run It will be available in LibreOffice 3.6.3. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.