Bug Hunting Session
Bug 56738 - Wrong encoding in comments for file edited by Mac OS MS Word
Summary: Wrong encoding in comments for file edited by Mac OS MS Word
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.3.2 release
Hardware: x86-64 (AMD64) All
: medium critical
Assignee: Not Assigned
URL:
Whiteboard: target:6.3.0 target:6.2.5
Keywords:
Depends on:
Blocks: DOC-Comments
  Show dependency treegraph
 
Reported: 2012-11-04 14:31 UTC by Luiz Angelo Daros de Luca
Modified: 2019-05-31 08:31 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample file with problem on comments (48.00 KB, application/msword)
2012-11-04 14:31 UTC, Luiz Angelo Daros de Luca
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luiz Angelo Daros de Luca 2012-11-04 14:31:16 UTC
Created attachment 69520 [details]
Sample file with problem on comments

Hello,

I have a file produced in LibreOffice. It was converted to .doc and sent to a reviewer that uses M$ Word in MAC OSX. After the edition, comments got a strange encoding, turning non-ASCII characters into Chinese.

If I open the file in Windows Word, it is ok. I.E:

  Não sei se é relevante esta pergunta. O que eu queria saber é se o médico atende muito parto na água. Tb posso fazer porcentagem de atendimento...

If I open in LO, I got:

  N縊 sei se �relevante esta pergunta. O que eu queria saber �se o m馘ico atende muito parto na 疊ua. Tb posso fazer porcentagem de atendimento...

The first comment is OK. The second one that got messed. If I remove the first comment, the second one also gets OK.

If I modify the file in LO after the problem has occurred, even M$ Word gets the work characters.

I'll attach a sample file with the problem.
Comment 1 Urmas 2012-11-05 03:35:34 UTC
Confirmed.
The displayed characters are from Latin-1 encoding being reinterpreted as Shift-JIS.
Comment 2 Luiz Angelo Daros de Luca 2013-01-21 22:10:04 UTC
Still exists in 4.0.x
Comment 3 Urmas 2013-01-22 06:18:04 UTC
In Word 97 and later documents, the text is always encoded in 1252 codepage, which is rather clearly stated in official documentation. Does LO tries to use charset value 0x80 to decode Unicode text? Oh open source...
Comment 4 Thomas Kluyver 2013-05-24 13:55:45 UTC
Still seeing this with 4.0.2.2
Comment 5 Luiz Angelo Daros de Luca 2015-04-02 02:52:21 UTC
Still present on 4.4.2.2
Comment 6 tommy27 2016-04-16 07:28:42 UTC Comment hidden (obsolete)
Comment 7 Thomas Kluyver 2016-04-18 11:05:16 UTC
I still see the issue on Libreoffice 5.0.5.2 on Fedora.
Comment 8 Luiz Angelo Daros de Luca 2016-05-01 22:54:51 UTC
Still present on 5.1.2.2 ubuntu 16.04
Comment 9 QA Administrators 2017-05-22 13:40:18 UTC Comment hidden (obsolete)
Comment 10 Thomas Kluyver 2017-06-13 13:02:56 UTC
I still see the bug in LibreOffice 5.3.3.2 running on Ubuntu 16.04.
Comment 11 QA Administrators 2019-05-19 02:50:00 UTC Comment hidden (obsolete)
Comment 12 Thomas Kluyver 2019-05-19 08:25:43 UTC
Still reproducible in Libreoffice 6.2.3 on Fedora. Details from the about dialog:

Version: 6.2.3.2
Build ID: 6.2.3.2-1.fc30
CPU threads: 4; OS: Linux 5.0; UI render: default; VCL: gtk3; 
Locale: en-GB (en_GB.UTF-8); UI-Language: en-US
Calc: threaded
Comment 13 Commit Notification 2019-05-29 12:44:27 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/+/fd01ddd3094dd080a455665342316c79dbee8390%5E%21

tdf#56738: fix encoding in comments in doc files (>= Word 97)

It will be available in 6.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Julien Nabet 2019-05-29 12:51:58 UTC
Fix on gerrit for 6.2 branch waiting for review: https://gerrit.libreoffice.org/#/c/73157/
Comment 15 Xisco Faulí 2019-05-29 20:23:41 UTC
Verified in

Version: 6.3.0.0.alpha1+
Build ID: aa687b22991e6c674b1d8653d52fbe9a50080174
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded

@Julien Nabet, thanks for fixing this issue!
Comment 16 Commit Notification 2019-05-30 07:22:15 UTC
Julien Nabet committed a patch related to this issue.
It has been pushed to "libreoffice-6-2":

https://git.libreoffice.org/core/+/baf574312c68df5674d78066f7bb468481caad40%5E%21

tdf#56738: fix encoding in comments in doc files (>= Word 97)

It will be available in 6.2.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Thomas Kluyver 2019-05-31 08:31:53 UTC
Thanks Julien, one little rough edge sanded off. :-)