Bug 95706 - FILEOPEN: RTF import doesnt interpret ascii text encoding with windows code pages
Summary: FILEOPEN: RTF import doesnt interpret ascii text encoding with windows code p...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: Other All
: medium normal
Assignee: Vasily Melenchuk (CIB)
URL:
Whiteboard: interoperability target:7.4.0 target:...
Keywords: filter:rtf, needsDevEval
Depends on:
Blocks: RTF-Opening
  Show dependency treegraph
 
Reported: 2015-11-09 14:40 UTC by Dženan Zukić
Modified: 2022-04-25 12:11 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Problematic RTF file (20.45 KB, application/msword)
2015-11-09 14:40 UTC, Dženan Zukić
Details
Correct rendering by MSWord (105.56 KB, application/pdf)
2015-11-09 14:40 UTC, Dženan Zukić
Details
An incorrect rendering of the file by LibreOffice (27.38 KB, application/pdf)
2015-11-10 00:26 UTC, Dženan Zukić
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dženan Zukić 2015-11-09 14:40:16 UTC
Created attachment 120415 [details]
Problematic RTF file

Character encoding is wrong in LO. See the attached correct rendering, e.g. the title:
"Izvod po Tekućem računu ..." gets rendered as:
"Izvod po Tekuæem raèunu ...".
Comment 1 Dženan Zukić 2015-11-09 14:40:52 UTC
Created attachment 120416 [details]
Correct rendering by MSWord
Comment 2 raal 2015-11-09 20:23:32 UTC
I can not confirm with Version: 5.1.0.0.alpha1+
Build ID: c5fefe46fc9dca3942b2fc33ffd1f7e041d450e6
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2015-11-04_07:04:49
text is correct : Izvod po Tekućem računu
Comment 3 Dženan Zukić 2015-11-10 00:26:24 UTC
Created attachment 120432 [details]
An incorrect rendering of the file by LibreOffice

Generated by libo-master-2015-11-09_23.11.30_LibreOfficeDev_5.1.0.0.alpha1_Win_x64.msi
Comment 4 Dženan Zukić 2015-11-10 00:27:30 UTC
I just rechecked, and it is still wrong. I used version libo-master-2015-11-09_23.11.30_LibreOfficeDev_5.1.0.0.alpha1_Win_x64.msi
Comment 5 Buovjaga 2015-11-12 10:22:43 UTC
Repro with 5.1 and 3.5.0
5.0.3 gives read error and refuses to open it.

Win 7 Pro 64-bit, Version: 5.0.3.2 (x64)
Build ID: e5f16313668ac592c1bfb310f4390624e3dbfb75
Locale: fi-FI (fi_FI)

Version: 5.1.0.0.alpha1+
Build ID: b216cc1b8096eb60c27f67e8c27b7cd756c75e38
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-11-12_00:06:20
Locale: fi-FI (fi_FI)

3.5.0
Comment 6 Timur 2016-02-08 19:18:30 UTC
Looking at other reports, it looked like Windows only problem, but I reproduced it with Linux also. 
Original font is Tahoma. LO says it's unavailable and substituted, although I have it installed in Windows. So, not sure whether related to Bug 64509.
Comment 7 Urmas 2016-02-09 11:05:01 UTC
No surprise it is displayed wrong:

{\fonttbl{\f1 Tahoma CE}}
Comment 8 Dženan Zukić 2016-02-09 13:44:02 UTC
Shouldn't that be replaced by another CE (Central European) font? Or the glyph mapping transformed onto a Unicode font (regular Tahoma is installed on my OS)?
Comment 9 Yousuf Philips (jay) (retired) 2016-10-05 04:09:07 UTC
Opening attachment 120415 [details] in word 2010 and resaving it as an rtf doesnt result in this problem.

So the issue seems to be boil down to older rtf's having font names that reference which windows code page[1] the text is encoded with and that being not understood by the rtf import. So '{\f1 Tahoma CE}' should reference the Central European Windows-1250 code page[2] which for example has ascii 230 (æ) as ć.

Unless LO stores the character mapping of windows code pages and also has a routine for conversion, i dont think this is something that could be fixed. @Miklos: Any thoughts?

[1] https://en.wikipedia.org/wiki/Windows_code_page
[2] https://en.wikipedia.org/wiki/Windows-1250
Comment 10 QA Administrators 2019-04-03 02:56:47 UTC Comment hidden (obsolete)
Comment 11 Mike 2020-06-14 07:34:06 UTC
reproduced with

Version: 6.4.4.2 (x64)
Build ID: 3d775be2011f3886db32dfd395a6a6d1ca2630ff
CPU threads: 4; OS: Windows 10.0 Build 18363; UI render: GL; VCL: win; 
Locale: de-DE (de_DE); UI-Language: en-US
Calc: CL

Version: 7.0.0.0.beta1 (x64)
Build ID: 94f789cbb33335b4a511c319542c7bdc31ff3b3c
CPU threads: 4; OS: Windows 6.1 Service Pack 1 Build 7601; UI render: Skia/Raster; VCL: win
Locale: en-US (de_DE); UI: en-US
Calc: threaded
Comment 12 Commit Notification 2022-04-07 12:29:40 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/965313b9efc761c70aacf6e3ebee60ffa2b1d5dd

tdf#95706: RTF import: Use fontname suffixes to detect encoding

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Commit Notification 2022-04-08 09:21:20 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/d72dece2bc61e3bab8db5968d53dc0e98a3bea4d

tdf#95706: RTF import: Use fontname suffixes to detect encoding

It will be available in 7.3.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2022-04-08 09:23:30 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/844be7358f1eec00094a55fa1fb4fadadb8cd1bf

tdf#95706: RTF import: tolerant font table parsing

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2022-04-11 09:55:07 UTC
Vasily Melenchuk committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/8daac72b7a0b7cdf6eb520273829c0c0c15ddef5

tdf#95706: RTF import: tolerant font table parsing

It will be available in 7.3.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Timur 2022-04-25 12:11:52 UTC
Thanks Vasily.