Description: Whe loading an utf8 text file with Umlauts (äöüÄÖÜ) into the writer, all Umlauts are not correctly presented. Since Version 7.1 alpha it works correctly. Steps to Reproduce: 1. using a text file with ö,ä,ü edited and saved in utf8 format ( e.g. done with npp. 2. just open it in 7.0.3.1 and have a look (bad !) 3. then compair the same Txt sample with the look inv 7.1 alpha ( fine!) Actual Results: 123456789ABCDEF AE=Ä OE=Ö UE=Ãœ ae=ä oe=ö ue=ü Expected Results: 123456789ABCDEF AE=Ä OE=Ö UE=Ü ae=ä oe=ö ue=ü Reproducible: Always User Profile Reset: No Additional Info: Just take the UTF8 content an represent it coorectly.
The problem that you see is related to your text file having no BOM that is used to mark file as UTF-8. In the absence of that byte-order mark, previous versions of LibreOffice didn't detect the encoding, and used current system encoding - which on Windows is ~guaranteed to be non-UTF-8. So your text was imported using wrong encoding. That was *not* a bug, but a missing feature of recognition of such files. The correct way to open such files was using special "Text - Choose Encoding" filter in File Open dialog. In v.7.1, tdf#60145 was implemented, as you see. So no, when you see something fixed in the next version, it doesn't mean that not having it in the previous version is a bug and should be fixed. Not having it in 7.0 is NOTABUG.
That was *not* a bug, but a missing feature of recognition of such files. The correct way to open such files was using special "Text - Choose Encoding" filter in File Open dialog ....................... UTF8 w.o. BOM is a standard used world-wide ; and still it is. --------------------------- Therefore we can discuss, if a missing Standard behaviour is a Bug or Not. It is not a good idea to prepare such answers, just to avoid to concede a minor or major or normal‘ behaviour. Your hint choosing Encoding , there are Unicode UTF-xx Options onyl – no Standard UTF-xx Therefore my bugzilla report was correct.
Please don't play with the bug status, This is not a bug in the software. It is working as intended in 7.0. It is enhanced in 7.1. It will not be changed in 7.0 retroactively, since all new features, such as new detection code, are only introduced in master, not in release branches. This bug is closed. Period.
Created attachment 167160 [details] screenshot details 138100 shows the selection box after selected Text- Choosing Encoding e.g. in german (DE) Text - Textcodierung wählen Unicode - (UTF-7) / Unicode - (UTF-8) / Unicode - (UTF-16) What Option should be selected for the Standard UTF-8 (Not-UNICODE)
(In reply to hastrondl from comment #5) > What Option should be selected for the Standard UTF-8 (Not-UNICODE) There is *never* a text encoded in one of UTF encodings, which is not Unicode. UTF (*Unicode* Transformation Format) encoding family is created to encode UCS (Universal Coded Character Set) character set standardized in ISO 10646, and that ISO standard is deliberately synchronized (identical) to The Unicode Standard (created/maintained by Unicode Consortium). Any UTF-encoded file is "some sequence of UCS codepoints, each codepoint encoded using this specific UTF variant". So after decoding, you get sequence of UCS/Unicode codepoints, never something else. Please check RFC 3629 (UTF-8), and also RFC 2781 (UTF-16), RFC 2152 (UTF-7); ISO 10646; The Unicode Standard (current version [1] of which explicitly says "This version of the Unicode Standard is also synchronized with ISO/IEC 10646:2020, sixth edition", just like previous versions stated synchronization with then-respective ISO standard versions). So the idea of a "Standard UTF-8 (Not-UNICODE)" is absurd. [1] http://www.unicode.org/versions/Unicode13.0.0/
(In reply to hastrondl from comment #5) > What Option should be selected for the Standard UTF-8 (Not-UNICODE) In case Mike's reply wasn't clear enough -- to correctly display the utf8 format file created by npp like you described in comment #0, just choose "Unicode - (UTF-8)" option in LO 6.4.
(In reply to Ming Hua from comment #7) > just choose "Unicode - (UTF-8)" option in LO 6.4. Thanks - you are quite right; not only in 6.4, but in any version (including 7.1); that latter upcoming version 7.1 *also* can autodetect it, but the manual option with that "Text - Choose Encoding" filter is also there.