Bug 124588 - FILEOPEN DOC DOCX RTF: U+00AD should not be treated as soft hyphen in Word documents
Summary: FILEOPEN DOC DOCX RTF: U+00AD should not be treated as soft hyphen in Word do...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: dataLoss
Depends on:
Blocks: RTF Formatting-Mark DOCX DOC
  Show dependency treegraph
 
Reported: 2019-04-07 13:40 UTC by Phil Krylov
Modified: 2024-08-22 03:15 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Document to reproduce the bug (43.59 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-04-07 13:41 UTC, Phil Krylov
Details
Font to reproduce the bug (44.71 KB, application/x-font-ttf)
2019-04-07 13:42 UTC, Phil Krylov
Details
Word screenshot (8.00 KB, image/png)
2019-04-07 13:43 UTC, Phil Krylov
Details
Writer screenshot (8.40 KB, image/png)
2019-04-07 13:44 UTC, Phil Krylov
Details
comparison MSO 2010 and LibreOffice 6.5 Master (56.17 KB, image/png)
2019-11-21 13:16 UTC, Xisco Faulí
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Phil Krylov 2019-04-07 13:40:03 UTC
Description:
Word treats U+00AD as a normal character and there are actual fonts that have a non-hyphen glyph mapped to this codepoint. For soft hyphens, Word uses 0x1F in DOC, <w:softHyphen/> in DOCX, \- in RTF. On import, Writer converts all these to U+00AD, so that normal U+00AD character usage is not possible, and (even worse) one can't distinguish between normal U+00AD character usage and soft hyphen to change non-Unicode-compliant usages to some other codepoint.

Steps to Reproduce:
Install the attached font and open the attached document

Actual Results:
You see a soft hyphen in the sample

Expected Results:
A diacritic from the font should be displayed


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Phil Krylov 2019-04-07 13:41:48 UTC
Created attachment 150579 [details]
Document to reproduce the bug
Comment 2 Phil Krylov 2019-04-07 13:42:25 UTC
Created attachment 150580 [details]
Font to reproduce the bug
Comment 3 Phil Krylov 2019-04-07 13:43:15 UTC
Created attachment 150581 [details]
Word screenshot
Comment 4 Phil Krylov 2019-04-07 13:44:03 UTC
Created attachment 150582 [details]
Writer screenshot
Comment 5 Mike Kaganski 2019-04-07 14:14:32 UTC
But U+00AD *is* soft hyphen? At least Unicode tells that: https://www.unicode.org/charts/PDF/U0080.pdf
Comment 6 Phil Krylov 2019-04-07 14:20:47 UTC
Yes it is - as per Unicode spec. But in Word documents, 0x00AD is a normal character. So the problem is how to allow usage of 0x00AD as a normal character in LibreOffice (if we remap them on import to some other codepoint, they won't be displayed with the proper glyph). Probably some special character attribute can be added for verbatim usages of special chars.
Comment 7 Phil Krylov 2019-04-07 15:58:03 UTC
Another option could be adding a user-changeable import filter preference to convert U+00AD to some other codepoint/string. Ugly, right.
Comment 8 Xisco Faulí 2019-10-16 11:46:49 UTC
@Khaled, I thought you might be interested in this issue...
Comment 9 ⁨خالد حسني⁩ 2019-10-17 13:03:58 UTC
(In reply to Xisco Faulí from comment #8)
> @Khaled, I thought you might be interested in this issue...

What Word doing is not Unicode-conformant and is probably some legacy behavior kept for backward compatibility. What LibreOffice should do when reading Word files is not something I’m qualified to answer.
Comment 10 Xisco Faulí 2019-11-21 13:16:15 UTC
Created attachment 156001 [details]
comparison MSO 2010 and LibreOffice 6.5 Master
Comment 11 Xisco Faulí 2019-11-21 13:16:41 UTC
Reproduced in

Version: 6.5.0.0.alpha0+
Build ID: 60b1a93a990a9978a30dee929526faf8db629a7f
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded
Comment 12 Maathew Peter 2020-12-01 10:35:01 UTC Comment hidden (spam)
Comment 13 QA Administrators 2024-08-22 03:15:55 UTC
Dear Phil Krylov,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.
 
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not 
appropriate in this case)


If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword


Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug