Bug 55187 - FILEOPEN: Character "č" misinterpreted in DOCX importer
Summary: FILEOPEN: Character "č" misinterpreted in DOCX importer
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.0 release
Hardware: All All
: medium major
Assignee: Miklos Vajna
URL:
Whiteboard: BSA target:3.7.0 target:3.6.3
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-21 09:18 UTC by Martin Srebotnjak
Modified: 2012-09-21 15:45 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample docx file with a line with "č" character (12.59 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-09-21 09:18 UTC, Martin Srebotnjak
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Martin Srebotnjak 2012-09-21 09:18:19 UTC
Created attachment 67487 [details]
Sample docx file with a line with "č" character

Problem description: 

Steps to reproduce:
1. Open the attached docx file.

Current behavior:
The simple line is split into two at the character "č" and that character is not displayed and is even not present (it treats "č" as a new line symbol although it is a regular character of most Slavic alphabets!)

Expected behavior:
The line in the imported file should be one, displaying the "č" characted within.

Platform (if different from the browser): 
              
Browser: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20100101 Firefox/15.0.1
Comment 1 Martin Srebotnjak 2012-09-21 09:21:36 UTC
Let me add that this bug is critical for Slovenian and probably all other Eastern European / Slavic languages - all texts from docx files might get garbled like this.

I reported a similar bug for 3.5.1, for rtf files:
https://www.libreoffice.org/bugzilla/show_bug.cgi?id=48356
At that time it seemed to have been fixed.
Comment 2 Miklos Vajna 2012-09-21 13:17:14 UTC
Confirmed, will fix in a bit.
Comment 3 Not Assigned 2012-09-21 13:18:05 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b3603e0e0e5dbfbeaa2426c499e8f64be2d15765

fdo#55187 fix DOCX import of unicode 0xNN0d when it's a separate run



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 4 Miklos Vajna 2012-09-21 13:22:17 UTC
Resolved in master, -3-6 review: https://gerrit.libreoffice.org/665
Comment 5 Roman Eisele 2012-09-21 13:25:58 UTC
(In reply to comment #3)
> Miklos Vajna committed a patch related to this issue.
> It has been pushed to "master":

Miklós, you are _too_ fast! ;-)

I wanted to confirm this bug right now, but while I was still typing in my pedantical description and some bad jokes about Microsoft’s complicated way to store a simple line of text, you have already taken and fixed the issue.

Congratulations and thank you very much!


Changes for the record/statistics:

-- Already reproducible with LibO 3.5.0, therefore adapted Version field (the Version field should always contain the first version in which a bug is known to exist, not the last one).

-- Platform should be very probably All/All.
Comment 6 Not Assigned 2012-09-21 15:45:13 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-3-6":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=9d3af8d699c95b7433591701666a70554d543b96&g=libreoffice-3-6

fdo#55187 fix DOCX import of unicode 0xNN0d when it's a separate run


It will be available in LibreOffice 3.6.3.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.