My version: Version: 7.3.5.2 (x64) / LibreOffice Community Build ID: 184fe81b8c8c30d8b5082578aee2fed2ea847c01 CPU threads: 8; OS: Windows 10.0 Build 22621; UI render: Skia/Raster; VCL: win Locale: nb-NO (nb_NO); UI: nb-NO Calc: threaded I save a file using File->Save as or File->Save a copy, set the File type to "Text - choose a coding", in the filter selection dialog, I choose encoding "Unicode (UTF-8)" and line ending "LF". Then I inspect the resulting file using od (octal dump) with options to show byte values as ascii and hex code, (od -c -t x1). The file begins with these two lines: Ferden til boplassen Endelig stod jeg der. Langs den lille kanalen foran meg lå tre små sjøfly. Notice the three non-ascii characters in the last four words. Inspecting the outcome, I find as follows: $ od -c -t x1 Ren-tekst-versjon.txt | head -30 0000000 1 . F e r d e n 20 20 20 20 20 20 20 31 2e 46 65 72 64 65 6e 20 0000020 t i l b o p l a s s e n \n \n E 74 69 6c 20 62 6f 70 6c 61 73 73 65 6e 0a 0a 45 0000040 n d e l i g s t o d j e g 6e 64 65 6c 69 67 20 73 74 6f 64 20 6a 65 67 20 0000060 d e r . L a n g s d e n l 64 65 72 2e 20 4c 61 6e 67 73 20 64 65 6e 20 6c 0000100 i l l e k a n a l e n f o r 69 6c 6c 65 20 6b 61 6e 61 6c 65 6e 20 66 6f 72 0000120 a n m e g l ? t r e s m 61 6e 20 6d 65 67 20 6c 3f 20 74 72 65 20 73 6d 0000140 ? s j ? f l y . T e r m i n 3f 20 73 6a 3f 66 6c 79 2e 20 54 65 72 6d 69 6e (The first line is a heading, here indented by seven spaces, which I did not expect. In the original, it is not indented. The second line is part of a longer paragraph and is saved as a single long line - this is expected and OK.) The issue in this report is that the characters å and ø are replaced with question marks. It seems like the file has not been converted to utf-8, but rather to ascii.
A simple test case: (Since I have a Norwegian user interface, my English translations of the UI labels may be inexact.) 1. I opened a new writer document, and 2. selected the copyright symbol by clicking on the "Omega" button in the 'insert' toolbar. 3. Then I saved a copy of the document (File -> Save a copy) 4. navigating to a temporary folder "C:\cygwin64\tmp", 5. naming the file "Copyright" and 6. choosing the format "Text (choose encoding)". The coding dialog came up with "Unicode (UTF-8)" and "LF" selected, so I just clicked on "OK". I have Cygwin tools installed on my computer. 7. In a Bash command window, I changed to the temporary directory and issued 8. $ od -c Copyright.txt The output was: 0000000 ? \n 0000002 Notice the question mark. I will show the expected outcome in the next comment.
(Continued from the previous comment) In order to demonstrate what the correct outcome would be, I pasted the copyright symbol from the Writer document into the following Bash command line: $ echo '©' | od -t x1 0000000 c2 a9 0a 0000003 The bytes C2 A9 are the correct UTF-8 encoding of the code point 0xA9, the copyright symbol.
I saved the file in Windows as suggested and tested it on Linux and my result is $ od -c copyright.txt 0000000 357 273 277 302 251 \n 0000006 I also got this with version 7.3. Do you still see this with 7.5? Set to NEEDINFO. Change back to UNCONFIRMED, if the problem persists. Change to RESOLVED WORKSFORME, if the problem went away. Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community Build ID: 687b950702c49c90cff9a43655ea97a0343799a0 CPU threads: 2; OS: Windows 10.0 Build 22621; UI render: Skia/Raster; VCL: win Locale: en-US (en_FI); UI: en-US Calc: threaded
I installed Version: 7.5.1.2 (X86_64) / LibreOffice Community Build ID: fcbaee479e84c6cd81291587d2ee68cba099e129 CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win Locale: nb-NO (nb_NO); UI: nb-NO Calc: CL threaded yesterday, and tried again. I created a new text document with just one character in it, the copyright mark, Unicode 0xa9. I saved it first as a regular 'odt' file (C:\cygwin64\tmp\Copyright.odt), then used the menu File->Save a copy, File type "Text - select coding (txt)". In the coding dialog, UTF8 and LF. The "Byte order mark" checkbox came with a check mark, but was greyed out and could not be deselected. Then: $ od -c Copyright.txt 0000000 ? \n 0000002 So yes, I am still seeing the error in version 7.5. What Buovjaga is getting is a file with the UTF-8-encoded byte order mark 0xFEFF followed by the UTF-8-encoded copyright symbol 0xA9 Another test: I have an Ubuntu Linux with Libreoffice 7.4.4.2. With this version of Writer, it byte-order mark checkbox can be deselected, and the resulting file is $ od -c Copyright.txt 0000000 302 251 \n 0000003 This is the correct outcome. The bug is not present in Ubuntu Libreoffice Writer 7.4.4.2.
I tried this on a different laptop running Windows 11, but the bug is absent. Version: 7.5.1.2 (X86_64) / LibreOffice Community Build ID: fcbaee479e84c6cd81291587d2ee68cba099e129 CPU threads: 8; OS: Windows 10.0 Build 22621; UI render: Skia/Raster; VCL: win Locale: nb-NO (nb_NO); UI: en-GB Calc: CL threaded In this case, the user interface is English (en-GB). The Ubunutu case also has English UI (en-US).
Yet another test, with another laptop. Version: 7.3.5.2 (x64) / LibreOffice Community Build ID: 184fe81b8c8c30d8b5082578aee2fed2ea847c01 CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Vulkan; VCL: win Locale: nb-NO (nb_NO); UI: nb-NO Calc: threaded OS: Operativsystemnavn Microsoft Windows 10 Home Versjon 10.0.19045 Bygg 19045 (Norwegian "Bygg" = English "Build") $ od -c Copyright.txt 0000000 ? \n 0000002 In this case, the bug is present. What do the laptops have in common, those who manifest the bug? A: Operating system, B: Windows locale/language, C: Libreoffice User Interface language 1. The one where I first experienced the bug: A: Versjon Windows 10 Home Versjon 22H2 Installert den 12.10.2020 Operativsystembygg 19045.2604 Opplevelse Windows Feature Experience Pack 120.2212.4190.0 B: "Norsk Bokmål" (Norwegian) C: Standard Norsk bokmål 2. The one I am reporting about now: A:Versjon Windows 10 Home Versjon 22H2 Installert den 02.11.2020 Operativsystembygg 19045.2604 Opplevelse Windows Feature Experience Pack 120.2212.4190.0 B: "Norsk Bokmål" (Norwegian) C: Standard Norsk bokmål 3. The laptop that did not manifest the error: A: Versjon Windows 11 Home Versjon 22H2 Installert den 04.10.2022 Operativsystembygg 22621.1265 Opplevelse Windows Feature Experience Pack 1000.22638.1000.0 B: Two "preferred"languages: Norsk bokmål; English (USA) C: English (UK) - but Norwegian also available in the drop-down list
I changed the UI language in LibreOffice to English (USA), and now the bug is not there. This was on the first laptop, where I first experienced the bug. Then I changed the UI language in LibreOffice to Norwegian (Norsk bokmål) on the third laptop - the one which had English (UK) and which initially did not manifest the bug - and now the bug is there. So it now seems like the bug manifests itself only with the UI language Norwegian. When I find some more time, I may test other UI languages. Another hint to the origin of the bug may be that the filter settings dialog check box for Byte Order Mark is greyed out when the UI language is not English.
I have now tried two more user interface languages: Japanese and Spanish. In both cases, the file was saved correctly as UTF-8. Version 7.5.1.2.
I reproduced with nb-NO UI on Linux: Version: 7.6.4.1 (X86_64) / LibreOffice Community Build ID: e19e193f88cd6c0525a17fb7a176ed8e6a3e2aa1 CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3 Locale: en-AU (en_AU.UTF-8); UI: nb-NO Calc: threaded The Byte Order Mark setting was greyed out too. Already reproduced in 7.2.0.4. Not reproduced in en-US UI.
This is a problem of translation, that was incorrectly updated in commit a0c08eb77f9fd9e3b53f5c40abb554e83195fa27 (update translations for 6.0 beta1, 2017-11-22). The problem starts at https://opengrok.libreoffice.org/xref/translations/source/nb/svx/messages.po?r=c662aec6#12176 : > 12178 msgid "Arabic (ISO-8859-6)" > 12179 msgstr "Gresk (ISO-8859-7)" ... and continues through all the rest of RID_SVXSTR_TEXTENCODING_TABLE entries. This is the entry that the STR uses: > 12460 msgid "Chinese simplified (EUC-CN)" > 12461 msgstr "Unicode (UTF-8)"
Sorry, the problem was as far as in commit d9a4b60f9ae7e15c44675ea56fe6a06613c419ae (fix of damaged files from beta1, 2012-12-09).
... and finally, was already in https://git.libreoffice.org/translations/+/2e55a04c1a0e276ba878f7372ef92467023a23fb%5E%21/translations/source/nb/svx/source/dialog.po
https://gerrit.libreoffice.org/c/translations/+/164882 is an attempt to fix it. I don't know the language. The fix is done mainly by moving wrongly placed strings to their proper places; but for some, I just copied the missing strings from the respective nn file. A review is really needed from someone who reads the language, to make sure this blind fix makes sense.
The patch looks good to me, except: Chinese Traditional : Tradisjonell kinesisk not: Tradisjonelt kinesisk in lines 12371, 12407, 12431, 12443, 12449, 12467