Description: I create a CSV file "u.csv" in UTF-8 form with BOM which shows the single character "ü"; it contains exactly the seven hex bytes ef bb bf c3 bc 0d 0a When I open it in Calc, "ü" appears correctly in cell A1. I add and immediately delete a space character, then save the file with "Save" and also "Save as" the file "uu.csv". Both are written without the UTF-8 BOM, but with "ü" as a multi-byte character, so they are now incorrect: the file consists exactly of the four hex bytes c3 bc 0d 0a and re-opening the files in Calc fails to recognize them as UTF-8, giving "ü" in A1. Other programs may or may not treat the file as UTF-8, because it lacks the BOM. This could hardly be plainer: Calc should write UTF-8 files with the BOM. At the very least it should offer the user the chance to specify the character set for writing a sheet, and act appropriately. Its behavior now is wrong. Ideally it should be possible to specify a global default character set for text formats, with per-sheet formats possible and retained in .ODS files. The same should go for other components (e.g., Writer) where applicable. Note: I deal with spreadsheets that mix a huge variety of languages, including Korean, Chinese, Russian, Polish, Thai, and all European languages, so handling UTF-8 correctly is extremely important to me. With Calc right now it's quite painful to do this reliably when exchanging CSV files. Steps to Reproduce: 1. Create a UTF-8 CSV file (with BOM) containing the single character "ü". 2. Read it with Calc. 3. Make a null change. 4. Save the file. 5. Calc can no longer read the file correctly. It contains a multi-byte character, but no BOM. Actual Results: The file no longer signals itself as UTF-8, and Calc reads the contents as "ü". UTF-8 CSV files with Chinese characters, Russian characters, Thai characters, accented European characters, etc. are all wrecked for Calc by its own actions. Expected Results: "ü" in A1. Chinese characters, Russian characters, Thai characters, accented European characters, etc. appear correctly. Reproducible: Always User Profile Reset: Yes OpenGL enabled: Yes Additional Info: Version: 5.4.4.2 (x64) Build ID: 2524958677847fb3bb44820e40380acbe820f960 CPU threads: 8; OS: Windows 6.1; UI render: default; Locale: en-US (en_US); Calc: group User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
UTF-8 may use a BOM, but does not need a BOM. A BOM can be dangerous, if using a csv file in a database environment. When you open a csv file, then use filter "Text - Choose Encoding" in Writer and filter "Text CSV" in Calc. In both cases you get a dialog to choose the encoding. Both filter detect themselves, whether a BOM exists or not. Because there are application, which are not able to handle UTF-8 without BOM, users should have means to decide, whether to write a BOM or not. Such enhancement request exists already as bug 82254, *** This bug has been marked as a duplicate of bug 82254 ***