When I open a TXT file with LibreOffice Writer, several texts become mojibake, and LibreOffice Writer have no way to select a text encoding.
Some discussions at LibreOffice Chinese Community:
我在 Unicode 网站的 UDHR in Unicode 专栏下载 TXT 文件以后发现用 LibreOffice 打开以后不知道在哪选择文字编码，怎么办？
I download TXT file from UDHR in Unicode column in Unicode, then found (I) don’t know where to select a text encoding. How to do?
Only open with text editor and then paste into Writer. If into Calc, selective paste could select the encoding.
Steps to Reproduce:
1. Download any TXT file from: http://www.unicode.org/udhr/aggregates.html
2. Open with LibreOffice Writer
Doing above operations I will see some non Latin texts become mojibake. I have also open this file with BabelPad, it looks proper, then I insert a BOM (U+FEFF) at the beginning of text and save it, then the file still looks unhappy with LO Writer.
When open a TXT file, LibreOffice should provide an interface to select a text encoding at startup, this interface should also include a preview pane.
User Profile Reset: No
Version: 126.96.36.199 (x64)
Build ID: 5ad7b2889021c491af62f7930a4b1cb631392f16
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 布局引擎：新;
Locale: zh-CN (zh_CN); Calc: group
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:50.0) Gecko/20100101 Firefox/50.0
(In reply to Volga from comment #0)
> LibreOffice Writer have no way to select a text encoding.
There is a way. In the open file dialog scroll the types list, and choose "Text - Choose Encoding".
> have also open this file with BabelPad, it looks proper, then I insert a BOM
> (U+FEFF) at the beginning of text and save it, then the file still looks
> unhappy with LO Writer.
Then maybe this file is of little-endian order, and should have U+FFFE instead? Can you please attach that file?
Created attachment 130529 [details]
OK, there is some confusion here. U+FEFF is the UTF-16 BOM, but the attached file is UTF-8, and even has UTF-8 BOM.
And still, this file shows real bug. It's correctly detected as UTF-8 under Linux but not under Windows. Most likely the reason is that SwIoSystem::IsDetectableText returns false if the detected line end differs from the system one, which is the case here since the file has LF, but Windows default is CRLF. There is similar case in Bug 63673 but with UTF-16 with CRLF under Linux.
I'm closing this bug as WORKSFORME, since the original request was for a way to choose encoding, and there is already. And improving the auto detection is handled already in Bug 63673.