Bug 60145 - FILEOPEN: UTF-8 encoding without BOM is not detected
Summary: FILEOPEN: UTF-8 encoding without BOM is not detected
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.5.2 release
Hardware: Other Windows (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords: easyHack, skillCpp
: 108941 (view as bug list)
Depends on:
Blocks: Save-Text
  Show dependency treegraph
 
Reported: 2013-02-01 06:53 UTC by styfx.dev
Modified: 2018-11-15 03:41 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
A file that shows LibreOffice not rendering apostrophe correctly (852 bytes, text/plain)
2013-02-01 06:53 UTC, styfx.dev
Details
screencopy of the bugdoc in LO 4.4.2.0.0+ (113.31 KB, image/png)
2015-02-22 10:55 UTC, Jean-Baptiste Faure
Details

Note You need to log in before you can comment on or make changes to this bug.
Description styfx.dev 2013-02-01 06:53:56 UTC
Created attachment 74033 [details]
A file that shows LibreOffice not rendering apostrophe correctly

Problem description: 

Steps to reproduce:
1. Open Write activity from Sugar
2. Type some text that includes an apostrophe (')
3. Press "Export to TXT"
(Exported with Write activity v86 on XO-4 Touch, Build 1.5.0 for XO-4 One Education OS 1.5 build 3)
4. Use Journal to copy over to a USB stick
5. With the USB stick, copy the TXT file
6. Rename the file, adding ".txt" to the end of the file so it will open in Windows (OE OS bug) 
7. Right click on the renamed file, and select "open with"
8. Select LibreOffice Writer
9. The apostrophes renders ’ instead of '
10. Right click on the renamed file, and select "open with"
11. Select Notepad
12. The apostrophe renders '

Note: File is included
Note2: This may possibly be reproduced in Abiword, as Write activity is based on Abiword

Current behaviour:
apostrophe renders as ’

Expected behaviour:
apostrophe renders as '
              
Operating System: Windows 7
Version: 3.6.5.2 rc
Comment 1 Urmas 2013-02-01 09:45:29 UTC
You can also report a bug in Abiword, but I wouldn't expect anything constructive from that bunch.
Comment 2 QA Administrators 2015-02-19 15:49:58 UTC Comment hidden (obsolete)
Comment 3 Jean-Baptiste Faure 2015-02-22 10:55:39 UTC
Created attachment 113600 [details]
screencopy of the bugdoc in LO 4.4.2.0.0+

I do not reproduce the problem with LibreOffice 4.4.2.0.0+ built at home under Ubuntu 14.10. It works fine too in versions 4.1, 4.2 and 4.3. 

Best regards. JBF
Comment 4 Jean-Baptiste Faure 2015-02-22 10:56:37 UTC
Closing as WorksForMe. Please feel free to reopen if you still experience this problem with current stable versions.

Best regards. JBF
Comment 5 Sk!d 2017-04-27 14:02:16 UTC
I am using LibreOffice 5.3.2.2 (x64) with Windows 7. And this bug still affects me. The standard encoding for .txt files without BOM should be utf-8 as the Unicode Standard does not require or recommend the use of BOM.
Comment 6 Buovjaga 2017-07-05 07:49:54 UTC
*** Bug 108941 has been marked as a duplicate of this bug. ***
Comment 7 QA Administrators 2018-10-08 02:48:02 UTC Comment hidden (obsolete)
Comment 8 Mike Kaganski 2018-10-13 20:21:57 UTC
A code pointer: SwASCIIParser::ReadChars() in sw/source/filter/ascii/parasc.cxx does the autodetection of the encoding (of a 4 KiB buffer) using SwIoSystem::IsDetectableText. The latter only checks for BOM. I suppose we should not change that, but change the following processing (in case when currentCharSet == RTL_TEXTENCODING_DONTKNOW).

In that case, we should possibly try treating the file as UTF-8, with options that strictly detect invalid sequences, and in case of failure, restart with RTL_TEXTENCODING_ASCII_US (or maybe user/working locale?).