Bug 60145 - FILEOPEN: UTF-8 encoding without BOM is not detected
Summary: FILEOPEN: UTF-8 encoding without BOM is not detected
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.5.2 release
Hardware: Other Windows (All)
: medium normal
Assignee: Tomofumi Yagi
URL:
Whiteboard: BSA target:7.1.0
Keywords: easyHack, skillCpp
: 108941 (view as bug list)
Depends on:
Blocks: Save-Text
  Show dependency treegraph
 
Reported: 2013-02-01 06:53 UTC by styfx.dev
Modified: 2022-06-07 15:33 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
A file that shows LibreOffice not rendering apostrophe correctly (852 bytes, text/plain)
2013-02-01 06:53 UTC, styfx.dev
Details
screencopy of the bugdoc in LO 4.4.2.0.0+ (113.31 KB, image/png)
2015-02-22 10:55 UTC, Jean-Baptiste Faure
Details

Note You need to log in before you can comment on or make changes to this bug.
Description styfx.dev 2013-02-01 06:53:56 UTC
Created attachment 74033 [details]
A file that shows LibreOffice not rendering apostrophe correctly

Problem description: 

Steps to reproduce:
1. Open Write activity from Sugar
2. Type some text that includes an apostrophe (')
3. Press "Export to TXT"
(Exported with Write activity v86 on XO-4 Touch, Build 1.5.0 for XO-4 One Education OS 1.5 build 3)
4. Use Journal to copy over to a USB stick
5. With the USB stick, copy the TXT file
6. Rename the file, adding ".txt" to the end of the file so it will open in Windows (OE OS bug) 
7. Right click on the renamed file, and select "open with"
8. Select LibreOffice Writer
9. The apostrophes renders ’ instead of '
10. Right click on the renamed file, and select "open with"
11. Select Notepad
12. The apostrophe renders '

Note: File is included
Note2: This may possibly be reproduced in Abiword, as Write activity is based on Abiword

Current behaviour:
apostrophe renders as ’

Expected behaviour:
apostrophe renders as '
              
Operating System: Windows 7
Version: 3.6.5.2 rc
Comment 1 Urmas 2013-02-01 09:45:29 UTC
You can also report a bug in Abiword, but I wouldn't expect anything constructive from that bunch.
Comment 2 QA Administrators 2015-02-19 15:49:58 UTC Comment hidden (obsolete)
Comment 3 Jean-Baptiste Faure 2015-02-22 10:55:39 UTC
Created attachment 113600 [details]
screencopy of the bugdoc in LO 4.4.2.0.0+

I do not reproduce the problem with LibreOffice 4.4.2.0.0+ built at home under Ubuntu 14.10. It works fine too in versions 4.1, 4.2 and 4.3. 

Best regards. JBF
Comment 4 Jean-Baptiste Faure 2015-02-22 10:56:37 UTC
Closing as WorksForMe. Please feel free to reopen if you still experience this problem with current stable versions.

Best regards. JBF
Comment 5 Sk!d 2017-04-27 14:02:16 UTC
I am using LibreOffice 5.3.2.2 (x64) with Windows 7. And this bug still affects me. The standard encoding for .txt files without BOM should be utf-8 as the Unicode Standard does not require or recommend the use of BOM.
Comment 6 Buovjaga 2017-07-05 07:49:54 UTC
*** Bug 108941 has been marked as a duplicate of this bug. ***
Comment 7 QA Administrators 2018-10-08 02:48:02 UTC Comment hidden (obsolete)
Comment 8 Mike Kaganski 2018-10-13 20:21:57 UTC
A code pointer: SwASCIIParser::ReadChars() in sw/source/filter/ascii/parasc.cxx does the autodetection of the encoding (of a 4 KiB buffer) using SwIoSystem::IsDetectableText. The latter only checks for BOM. I suppose we should not change that, but change the following processing (in case when currentCharSet == RTL_TEXTENCODING_DONTKNOW).

In that case, we should possibly try treating the file as UTF-8, with options that strictly detect invalid sequences, and in case of failure, restart with RTL_TEXTENCODING_ASCII_US (or maybe user/working locale?).
Comment 9 Commit Notification 2020-09-30 08:18:53 UTC
Tomofumi Yagi committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ef77a256de527f6d00212839e55f949024f2e7bc

tdf#60145 sw: fix UTF-8 encoding without BOM is not detected

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Xisco Faulí 2020-09-30 12:19:46 UTC
Hi Tomofumi Yagi,
thank you very much for fixing this issue. Could you please add it to the release notes of LibreOffice 7.1 < https://wiki.documentfoundation.org/ReleaseNotes/7.1 > ?
Comment 11 Tomofumi Yagi 2020-10-03 12:13:07 UTC
(In reply to Xisco Faulí from comment #10)
> Hi Tomofumi Yagi,
> thank you very much for fixing this issue. Could you please add it to the
> release notes of LibreOffice 7.1 <
> https://wiki.documentfoundation.org/ReleaseNotes/7.1 > ?

Thank you for your advice.
I added it to the release notes of LibreOffice 7.1
https://wiki.documentfoundation.org/ReleaseNotes/7.1#Writer