Bug 142325 - Error read rtf created Gnostice eDocEngine V5.0.0.548
Summary: Error read rtf created Gnostice eDocEngine V5.0.0.548
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Miklos Vajna
URL:
Whiteboard: target:7.2.0 target:7.1.5
Keywords:
Depends on:
Blocks: RTF
  Show dependency treegraph
 
Reported: 2021-05-17 09:10 UTC by zahour@zah.cz
Modified: 2021-06-01 08:44 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Document example error (14.83 KB, application/rtf)
2021-05-17 09:10 UTC, zahour@zah.cz
Details
new rtf (14.71 KB, application/rtf)
2021-05-20 20:19 UTC, Julien Nabet
Details
console logs (103.03 KB, text/plain)
2021-06-01 08:04 UTC, Julien Nabet
Details

Note You need to log in before you can comment on or make changes to this bug.
Description zahour@zah.cz 2021-05-17 09:10:37 UTC
Created attachment 172080 [details]
Document example error

Error message : Find Error Format File in 1,180 ....
Comment 1 Ming Hua 2021-05-17 09:57:50 UTC
Same error when opening the attached sample file in 7.2/master daily build:
Version: 7.2.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: 18e5e948dd66e41f17b0a63bf631d98aee84a03b
CPU threads: 2; OS: Windows 10.0 Build 19041; UI render: Skia/Raster; VCL: win
Locale: zh-CN (zh_CN); UI: zh-CN
Calc: threaded

But Windows' own wordpad.exe seems to struggle with the sample file as well.  Are you sure the sample file is valid?
Comment 2 Timur 2021-05-17 11:18:23 UTC
Repro LO 4.2 and LO 7.2+. New. 
Lo 3.4 could open it but it didn't look well at all.
Comment 3 Julien Nabet 2021-05-20 15:55:34 UTC
I gave a try with https://products.aspose.app/words/viewer, the file seems quite ok. (at least I saw nothing obviously wrong).

With master sources updated today, I can reproduce the pb too with this message:
File format error found at 1,180 /..../writerfilter/source/rtfok/rtfdocumentimpl.cxx:827(row,col)

Miklos: since it concerns rtf, thought you might be interested in this one.
Comment 4 Julien Nabet 2021-05-20 20:19:34 UTC
Created attachment 172212 [details]
new rtf

I noticed that after unicode character, there's \'3

For example in the first line, you got:
author Miloslava H\u345\'3?

If you remove all the "\'3" so we'll have here:
author Miloslava H\u345?

It'll work.

I attached the file with this pattern removed.

From https://en.wikipedia.org/wiki/Rich_Text_Format, it tells:
"For a Unicode escape, the control word \u is used, followed by a 16-bit signed integer which corresponds to the Unicode UTF-16 code unit number. For the benefit of programs without Unicode support, this must be followed by the nearest representation of this character in the specified code page. For example, \u1576? would give the Arabic letter bāʼ ب, but indicates that older programs which do not support Unicode should render it as a question mark instead. "
Comment 5 Commit Notification 2021-06-01 06:46:40 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6fc8a6b0b52509d735971f079d7b1660559d475d

tdf#142325 RTF import: tolerate invalid hex markup like "\'3?"

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Julien Nabet 2021-06-01 08:04:33 UTC
Created attachment 172522 [details]
console logs

On pc Debian x86-64 with master sources, I could open the file without error.
Thank you Miklos!

Just for information, I put console logs about all the warning messages I got when just opening the file.
Comment 7 Commit Notification 2021-06-01 08:44:06 UTC
Miklos Vajna committed a patch related to this issue.
It has been pushed to "libreoffice-7-1":

https://git.libreoffice.org/core/commit/87c307b6fc1eb86aa194832e9a293df435ed3f87

tdf#142325 RTF import: tolerate invalid hex markup like "\'3?"

It will be available in 7.1.5.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.