Word treats U+00AD as a normal character and there are actual fonts that have a non-hyphen glyph mapped to this codepoint. For soft hyphens, Word uses 0x1F in DOC, <w:softHyphen/> in DOCX, \- in RTF. On import, Writer converts all these to U+00AD, so that normal U+00AD character usage is not possible, and (even worse) one can't distinguish between normal U+00AD character usage and soft hyphen to change non-Unicode-compliant usages to some other codepoint.
Steps to Reproduce:
Install the attached font and open the attached document
You see a soft hyphen in the sample
A diacritic from the font should be displayed
User Profile Reset: No
Created attachment 150579 [details]
Document to reproduce the bug
Created attachment 150580 [details]
Font to reproduce the bug
Created attachment 150581 [details]
Created attachment 150582 [details]
But U+00AD *is* soft hyphen? At least Unicode tells that: https://www.unicode.org/charts/PDF/U0080.pdf
Yes it is - as per Unicode spec. But in Word documents, 0x00AD is a normal character. So the problem is how to allow usage of 0x00AD as a normal character in LibreOffice (if we remap them on import to some other codepoint, they won't be displayed with the proper glyph). Probably some special character attribute can be added for verbatim usages of special chars.
Another option could be adding a user-changeable import filter preference to convert U+00AD to some other codepoint/string. Ugly, right.