Word treats U+00AD as a normal character and there are actual fonts that have a non-hyphen glyph mapped to this codepoint. For soft hyphens, Word uses 0x1F in DOC, <w:softHyphen/> in DOCX, \- in RTF. On import, Writer converts all these to U+00AD, so that normal U+00AD character usage is not possible, and (even worse) one can't distinguish between normal U+00AD character usage and soft hyphen to change non-Unicode-compliant usages to some other codepoint.
Steps to Reproduce:
Install the attached font and open the attached document
You see a soft hyphen in the sample
A diacritic from the font should be displayed
User Profile Reset: No
Created attachment 150579 [details]
Document to reproduce the bug
Created attachment 150580 [details]
Font to reproduce the bug
Created attachment 150581 [details]
Created attachment 150582 [details]
But U+00AD *is* soft hyphen? At least Unicode tells that: https://www.unicode.org/charts/PDF/U0080.pdf
Yes it is - as per Unicode spec. But in Word documents, 0x00AD is a normal character. So the problem is how to allow usage of 0x00AD as a normal character in LibreOffice (if we remap them on import to some other codepoint, they won't be displayed with the proper glyph). Probably some special character attribute can be added for verbatim usages of special chars.
Another option could be adding a user-changeable import filter preference to convert U+00AD to some other codepoint/string. Ugly, right.
@Khaled, I thought you might be interested in this issue...
(In reply to Xisco Faulí from comment #8)
> @Khaled, I thought you might be interested in this issue...
What Word doing is not Unicode-conformant and is probably some legacy behavior kept for backward compatibility. What LibreOffice should do when reading Word files is not something I’m qualified to answer.
Created attachment 156001 [details]
comparison MSO 2010 and LibreOffice 6.5 Master
Build ID: 60b1a93a990a9978a30dee929526faf8db629a7f
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3;
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US