64603 – Saving plain text files with Cyrillic content are wrongly encoded in ANSI charset (Windows)

Bug 64603 - Saving plain text files with Cyrillic content are wrongly encoded in ANSI charset (Windows)

Summary: Saving plain text files with Cyrillic content are wrongly encoded in ANSI cha...

Status:	RESOLVED NOTABUG

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	3.6.4.3 release
Hardware:	x86-64 (AMD64) Windows (All)

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Save-Text
	Show dependency tree / graph

Reported:	2013-05-14 20:52 UTC by sogartary
Modified:	2021-05-13 09:18 UTC (History)
CC List:	3 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description sogartary 2013-05-14 20:52:16 UTC

I wrote some text in Writer in Cyrillic and saved it in "Text (.txt)", not "Text Encoded (.txt)". After opening the file again in Writer it displayed only question marks. I opened the file in a hex editor and indeed on the place of the Cyrillic letters, there were only question marks in ANSI charset encoding.
It seams that Writer has totally disregarded the presence of non ANSI characters.
What should happen in that case, is to save it in some format that can represent them, like utf8.

Comment 1 Julien Nabet 2013-05-17 22:23:52 UTC Comment hidden (obsolete)

sogartary: for the test, could you give a try to brand new 4.0.3?

Comment 2 sogartary 2013-05-23 05:04:08 UTC Comment hidden (obsolete)

The bug is still present in 4.0.3.3.

Comment 3 Julien Nabet 2013-05-28 17:35:50 UTC

sogartary: could you rename your LO directory profile (see https://wiki.documentfoundation.org/UserProfile) and give it a new try?
The goal is to be sure it's not due to customization or something.

Comment 4 sogartary 2013-06-03 17:03:20 UTC

I have tried what you said. Renamed the %appdata%\LibreOffice\4\user directory and retried again. The bug still persists.

Comment 5 sogartary 2013-06-04 09:15:58 UTC

I also tried to reproduce the bug on Ubuntu 12.04, agian with LibreOffice 4.0.3.3. It seams that there everything is Ok.

Comment 6 Julien Nabet 2013-06-04 20:22:05 UTC

sogartary: when comments 2, 4 and 5 read together, I don't understand.
Is it ok or not with 4.0.3 and a brand new LO profile?
If it's not the case, which case is ok?

Comment 7 QA Administrators 2014-02-02 02:06:59 UTC Comment hidden (obsolete)

Dear Bug Submitter,

Please read the entire message before proceeding.

This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information.

For more information about our NEEDINFO policy please read the wiki located here: 
https://wiki.documentfoundation.org/QA/FDO/NEEDINFO

If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed.


Thank you for helping us make LibreOffice even better for everyone!


Warm Regards,
QA Team

Comment 8 QA Administrators 2014-02-26 19:31:28 UTC Comment hidden (obsolete)

Dear Bug Submitter,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INVALID due to inactivity and a lack of information which is needed in order to accurately reproduce and confirm the problem. We encourage you to retest your bug against the latest release. If the issue is still present in the latest stable release, we need the following information (please ignore any that you've already provided):

a) Provide details of your system including your operating system and the latest version of LibreOffice that you have confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED and we will attempt to reproduce the issue. 
Please do not:
a) respond via email 
b) update the version field in the bug or any of the other details on the top section of FDO

Comment 9 Urmas 2014-12-03 17:04:18 UTC

Even Notepad warns about data loss when the file contains Unicode characters.

Comment 10 Buovjaga 2021-05-07 15:15:07 UTC

This is Windows-only. If you don't use the file format "Text - Choose Encoding", the file will have question marks in place of non-ASCII characters.

Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 9df3aa7ea72d61462e430643f2a80906dce4e15b
CPU threads: 2; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: fi-FI (fi_FI); UI: en-US
Calc: threaded Jumbo

Comment 11 Mike Kaganski 2021-05-13 07:49:47 UTC

(In reply to Urmas from comment #9)
> Even Notepad warns about data loss when the file contains Unicode characters.

As well as LibreOffice warns when anyone saves anything to TXT, because there will be inevitable loss - of formatting; of graphics; of metadata; of information in headers/footers. No need in additional warnings about "also, not all characters present in document *body* are representable in selected encoding's charset".

Of course this is not Windows-only; it would appear on any platform with system encoding being non-Unicode. However, on other platforms it's *usual* to use UTF-8. But there exist Linux systems using e.g. KOI-8R, etc.

This is NOTABUG.

Comment 12 Mike Kaganski 2021-05-13 09:18:22 UTC

On the other hand, why not change our "simple text" export to use UTF-8 (with BOM) instead of "system encoding" (unless there's existing encoding information from import - see tdf#120574, which is a different problem)? That would avoid this situation; UTF-8 is universal standard now, with much better chances to be correctly read than any other encoding. So should this become "Use UTF-8 instead of system encoding in text export filter by default"?