Description: When I import or paste text into Calc, I sometimes forget to change the "Character set:" to UTF-8. This MIGHT be the cause of finding a BOM or a non-printing zero-width unknown character embedded (not at start of file) into text files copy/pasted out of Calc. The "Character set:" menu should also be available in Preferences for the User to set a default. Steps to Reproduce: 1. Use an app other than LibreOffice to open a text file encoded UTF-8 2. Select some text and copy to clipboard. 3. Select a cell in LibreOffice and try to paste. Actual Results: "Character set:" is UTF-16 (always) Expected Results: "Character set:" should default to something user has put in preferences. Reproducible: Always User Profile Reset: No Additional Info: Probably applies to all platforms. I currently have Version: 6.4.4.2 Build ID: 3d775be2011f3886db32dfd395a6a6d1ca2630ff but this was noticed long ago. Version: 6.4.4.2 Build ID: 3d775be2011f3886db32dfd395a6a6d1ca2630ff CPU threads: 4; OS: Mac OS X 10.15.6; UI render: default; VCL: osx; Locale: en-US (en.UTF-8); UI-Language: en-US Don't know whether OpenGL is enabled. NOTE: webform says "information from menu Help - About LibreOffice" but on MacOS, the "About" is on the LibreOffice menu at the opposite end of the menu bar. Calc: threaded
What exactly do you mean with Character Set? (Tools) > Options > HTML compatibility?
Character set in some dialogs is the text encoding, like in Text/CSV or HTML. But IMHO the last used text encoding is remembered unless UTF-16 is detected (and then that would be remembered), which aren't many cases for, like embedded null-bytes. So yet another default actually is not needed. Attaching a small UTF-8 sample file which is detected as UTF-16 instead of the expected UTF-8 would be helpful.
ALL files created or edited by me are UTF-8 without BOM. That is the default for my editor. My locale is en_US.UTF-8 What I do frequently, is create a temporary text file with new words/phrases I want to learn in another language. I then copy them and paste into a spreadsheet which contains ALL the stuff I am learning. Finally, I export that spreadsheet to overwrite a (non-temporary) tab-delimited file which I can then import into Anki (https://apps.ankiweb.net). The file command confirms that both of those files are UTF-8. But the import dialog that comes up when I paste ALWAYS says UTF-16. Sometimes I forget to change it. I do not know whether that is the cause of my Anki problems, but I have noticed that occasionally, there is a BOM in the permanent file, right before a recently pasted data item (not at the beginning of file). And sometimes there is a zero-width non-printing character in the file. Whenever either of these spurious characters has appeared, they are always on an item that Anki is having trouble with.
(In reply to 伟思礼 from comment #3) > ALL files created or edited by me are UTF-8 without BOM. That's about normal these days when not on Windows. > The file command confirms that both of those files are UTF-8. > > But the import dialog that comes up when I paste ALWAYS says UTF-16. And that *never* happens for me. Hence my request to attach such file here. > Sometimes I forget to change it. I do not know whether that is the cause of > my Anki problems, but I have noticed that occasionally, there is a BOM in > the permanent file, right before a recently pasted data item (not at the > beginning of file). That would be wrong. A BOM must not occur in the middle of data, it may only appear at the start of a text stream. What did create that? > And sometimes there is a zero-width non-printing > character in the file. That shouldn't matter if it is properly encoded. > Whenever either of these spurious characters has appeared, they are always > on an item that Anki is having trouble with. So Anki is the problem, and not LibreOffice?
No UX issue, apparently. Rather NOB. Let's wait for a test file.
Created attachment 164009 [details] Sample UTF-8 file containing pasteable text
Created attachment 164010 [details] dialog showing default charset
Created attachment 164011 [details] dialog after correcting charset
At least there is an issue => NEW.
I don't see a problem. Yes, the encoding may be offered as UTF-16, but that is what _arrives_ at Calc from the clipboard, and as attachment 164010 [details] of comment 7 shows, the text is _correct_ in UTF-16. Comment 8 attachment 164011 [details] is not correcting the setting but voluntary picking UTF-8 and of course if the text is not encoded in UTF-8 then the text is broken after that. I also don't see necessity to have settable defaults here or what they would even solve, even if there were such then forcing the encoding to UTF-8 in this case would import broken text. Closing WFM.