Bug 38902 - UTF-8 contents should be detected and this codepage should be suggested for FILESAVE as ".txt coded"
Summary: UTF-8 contents should be detected and this codepage should be suggested for F...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
3.3.3 release
Hardware: x86 (IA32) All
: high major
Assignee: Not Assigned
Keywords: dataLoss
: 93907 (view as bug list)
Depends on:
Blocks: Save-Text
  Show dependency treegraph
Reported: 2011-07-02 01:27 UTC by Urmas
Modified: 2021-03-20 16:24 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:

Original (41 bytes, text/plain)
2011-07-03 18:21 UTC, Urmas
Saved in LO (28 bytes, text/plain)
2011-07-03 18:22 UTC, Urmas
Corrupted file (31 bytes, text/plain)
2011-07-03 22:25 UTC, Urmas
Screenshot of opening dialog (66.52 KB, image/pjpeg)
2011-07-03 23:15 UTC, Urmas

Note You need to log in before you can comment on or make changes to this bug.
Description Urmas 2011-07-02 01:27:26 UTC
1. Open UTF-8 file in Writer. See it in UTF.
2. Change something and save.
3. Open it again.

Problem: File was saved in local codepage and all letters was replaced with ?'s. As result the document is completely fucked up.
Comment 1 Don't use this account, use tml@iki.fi 2011-07-02 02:09:56 UTC
And how critical is it to not be able to use Libre*Office* as a plain text editor? On Windows even? Sheesh.
Comment 2 Urmas 2011-07-02 03:02:46 UTC
Since when bugs causing DATA LOSS are minor? Is that your private shop?
Comment 3 Don't use this account, use tml@iki.fi 2011-07-02 03:34:10 UTC
shop? it's my private opinion.
Comment 4 Rainer Bielefeld Retired 2011-07-03 06:42:48 UTC
NOT reproducible with "LibreOffice 3.4.1RC1 - WIN7  Home Premium (64bit) German UI [OOO340m1 (Build:103)]". Might have been fixed in between "somehow"?

May I ask you to read  hints on <http://wiki.documentfoundation.org/BugReport> carefully?
Then please:
- Write a meaningful Summary
- Attach a test kit  with utf8samplefile.txt and 
- Attach screenshots with comments (you can add information using LibO DRAW
  and then attach your screenshot with comments as PDF) if necessary
- Contribute a step by step instruction containing every key press and every 
  mouse click how to reproduce your problem
- add information 
  -- concerning your PC 
  -- concerning your OS
  -- concerning your LibO localization (UI language)
  –- Libo settings that might be related to your problems 
  -- how you launch LibO and how you opened the sample document
  -- everything else crossing your mind after you read a.m. URL

Can you test with 3.4.1?
Comment 5 Urmas 2011-07-03 18:21:13 UTC
Created attachment 48720 [details]
Comment 6 Urmas 2011-07-03 18:22:12 UTC
Created attachment 48721 [details]
Saved in LO
Comment 7 Urmas 2011-07-03 18:28:55 UTC
Either when I open it from Explorer or via File Open, any edit leads to corruption.
It is perfectly reproducible on 3.4.1, in this case with Russian UI on Windows XP.
Comment 8 Rainer Bielefeld Retired 2011-07-03 22:07:30 UTC
Still not reproducible wiht reporter's sample and with "LibreOffice 3.4.1RC1 - WIN7  Home Premium (64bit) German UI [OOO340m1 (Build:103)]" 

Please do not touch the Bugzilla pickers if you do not know for what they are.
With what Version did you see the problem the first time?
Your description is far away from "every key press and every mouse click". without that and clear description of of the reactions of LibO that can't be checked. 

You should discuss the problem on a user mailing list and then report the results here.
Comment 9 Urmas 2011-07-03 22:24:47 UTC
3.2.2? I don't remember, but OOO340m1 (Build:103) on 32-bit XP shows same behaviour.

As for instruction:
1. File/Open, select the file.
2. Append "123" in the end.
3. Press Save button.
4. Confirm saving in same format.
Comment 10 Urmas 2011-07-03 22:25:33 UTC
Created attachment 48729 [details]
Corrupted file
Comment 11 Rainer Bielefeld Retired 2011-07-03 23:00:48 UTC
Might be the document has not been opened via coded text import or something else, no idea, no useful information available. Closing INVALID for now.

Please excuse me, but such a report where you only show the result, but not the way how you got it is completely useless. It would help if you would follow my advice (read information CAREFULLY, discuss ...) instead of providing information fragments.

Please feel free to reopen this bug when you can contribute requested additional information due to <http://wiki.documentfoundation.org/BugReport>. Possibly the best way might be that (after discussion on user mailing list) you create a presentation showing screenshots with comments taken after every mouse click and every key press.
Comment 12 Don't use this account, use tml@iki.fi 2011-07-03 23:14:32 UTC
You need to save as "Text Encoded" and choose the "Unicode (UTF-8)" character set.

Sure, LO could perhaps be clever enough to understand this when saving a document that contains characters not in the system codepage into a plain text file on Windows. Or should it? The expected encoding of "text" files on Windows isn't exactly well-defined.

Rainer, I am reopening this, this *is* a real problem.
Comment 13 Urmas 2011-07-03 23:15:08 UTC
Created attachment 48731 [details]
Screenshot of opening dialog
Comment 14 Urmas 2011-07-03 23:17:29 UTC
> The expected encoding of "text" files on Windows
> isn't exactly well-defined.

If it has opened it as UTF-8, it should save it as UTF-8, there cannot be two opinions.
Comment 15 Rainer Bielefeld Retired 2011-07-13 07:51:28 UTC
Related to "Bug 39124 - copy base table to CALC uses wrong codepage for paste"?
Comment 16 Rainer Bielefeld Retired 2012-07-24 17:51:54 UTC
[Reproducible] with reporter's sample and "LibreOffice 3.3.3  German UI/Locale [OOO330m19 (Build:301) tag libreoffice-] on German WIN7 Home Premium (64bit), might be inherited from OOo?
Comment 17 Zoltán Hegedüs 2013-10-24 17:36:42 UTC Release:
Open an UTF-8 file with Writer. I tried with a file what has UTF-8 header. The extension be .txt.
Modify some characters.
Save the file. Writer saves the file in a non-Unicode codepage.
If I save the file with Save as - Encoded text, subsequently normal saves (only save, not save as) will be good.
If I open the file as encoded text, there is no error. If the extension is unknown for Writer, I can open this only as encoded text, beacuse the normal opening opens this in Calc.
This can cause DATA LOSS, so I modified from normal to major.
Comment 18 QA Administrators 2015-04-01 14:40:08 UTC Comment hidden (obsolete)
Comment 19 Buovjaga 2015-04-19 15:38:23 UTC

Win 7 Pro 64-bit Version: (x64)
Build ID: 211c12b9c64facd1c12f637a5229bd6a6feb032a
TinderBox: Win-x86_64@42, Branch:master, Time: 2015-04-18_01:51:17
Locale: fi_FI
Comment 20 m.a.riosv 2015-12-26 18:12:32 UTC
*** Bug 96730 has been marked as a duplicate of this bug. ***
Comment 21 m.a.riosv 2015-12-26 18:17:57 UTC
*** Bug 93907 has been marked as a duplicate of this bug. ***
Comment 22 Justin L 2017-02-24 16:10:57 UTC
Unable to reproduce in Linux (went back to oldest50 alpha), so it might actually be windows only (as is already marked).  After making changes and then saving over top of an existing .txt file, or as a new Text(.txt) format, I never got ?'s when re-opening.

I was able to confirm the problem still exists in Windows with 5.1.6.
Comment 23 Buovjaga 2017-02-27 19:15:10 UTC
Still repro

Build ID: 54d5b1828ec73d0475e0ddb6e31394a7e1904a1b
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; 
TinderBox: Win-x86@42, Branch:master, Time: 2017-02-09_23:41:14
Locale: fi-FI (fi_FI); Calc: group
Comment 24 Urmas 2017-02-28 01:03:50 UTC
Same result on Linux with ISO-8859-1 locale.
Comment 25 QA Administrators 2018-06-18 02:42:19 UTC Comment hidden (obsolete)
Comment 26 QA Administrators 2020-06-18 03:52:27 UTC Comment hidden (obsolete, spam)
Comment 27 Urmas 2021-03-20 16:17:43 UTC
Screwed another text file today with 7.2.0.