I create a CSV file "u.csv" in UTF-8 form with BOM which shows the single character "ü"; it contains exactly the seven hex bytes
ef bb bf c3 bc 0d 0a
When I open it in Calc, "ü" appears correctly in cell A1. I add and immediately delete a space character, then save the file with "Save" and also "Save as" the file "uu.csv".
Both are written without the UTF-8 BOM, but with "ü" as a multi-byte character, so they are now incorrect: the file consists exactly of the four hex bytes
c3 bc 0d 0a
and re-opening the files in Calc fails to recognize them as UTF-8, giving "Ã¼" in A1. Other programs may or may not treat the file as UTF-8, because it lacks the BOM.
This could hardly be plainer: Calc should write UTF-8 files with the BOM. At the very least it should offer the user the chance to specify the character set for writing a sheet, and act appropriately. Its behavior now is wrong.
Ideally it should be possible to specify a global default character set for text formats, with per-sheet formats possible and retained in .ODS files. The same should go for other components (e.g., Writer) where applicable.
Note: I deal with spreadsheets that mix a huge variety of languages, including Korean, Chinese, Russian, Polish, Thai, and all European languages, so handling UTF-8 correctly is extremely important to me. With Calc right now it's quite painful to do this reliably when exchanging CSV files.
Steps to Reproduce:
1. Create a UTF-8 CSV file (with BOM) containing the single character "ü".
2. Read it with Calc.
3. Make a null change.
4. Save the file.
5. Calc can no longer read the file correctly. It contains a multi-byte character, but no BOM.
The file no longer signals itself as UTF-8, and Calc reads the contents as "Ã¼". UTF-8 CSV files with Chinese characters, Russian characters, Thai characters, accented European characters, etc. are all wrecked for Calc by its own actions.
"ü" in A1. Chinese characters, Russian characters, Thai characters, accented European characters, etc. appear correctly.
User Profile Reset: Yes
OpenGL enabled: Yes
Version: 22.214.171.124 (x64)
Build ID: 2524958677847fb3bb44820e40380acbe820f960
CPU threads: 8; OS: Windows 6.1; UI render: default;
Locale: en-US (en_US); Calc: group
User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0
UTF-8 may use a BOM, but does not need a BOM. A BOM can be dangerous, if using a csv file in a database environment.
When you open a csv file, then use filter "Text - Choose Encoding" in Writer and filter "Text CSV" in Calc. In both cases you get a dialog to choose the encoding. Both filter detect themselves, whether a BOM exists or not.
Because there are application, which are not able to handle UTF-8 without BOM, users should have means to decide, whether to write a BOM or not. Such enhancement request exists already as bug 82254,
*** This bug has been marked as a duplicate of bug 82254 ***