136246 – [RTF] Import: Txt-Table is messed up

Bug 136246 - [RTF] Import: Txt-Table is messed up

Summary: [RTF] Import: Txt-Table is messed up

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	filters and storage (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:	filter:rtf

Depends on:
Blocks:	RTF
	Show dependency tree / graph

Reported:	2020-08-28 21:36 UTC by Dennis Roczek
Modified:	2024-09-06 16:15 UTC (History)
CC List:	4 users (show)

See Also:
Crash report or crash signature:

Attachments
Screenshot of the table in LO64 (86.91 KB, image/png) 2020-08-28 21:37 UTC, Dennis Roczek	Details
Screesnhot in LO5 (121.66 KB, image/png) 2020-08-28 21:37 UTC, Dennis Roczek	Details
Abiword (186.69 KB, image/png) 2020-08-28 21:38 UTC, Dennis Roczek	Details
Wordpad in Windows 10 (1909) (78.97 KB, image/png) 2020-08-28 21:38 UTC, Dennis Roczek	Details
Correct rendering in MSO Word (83.73 KB, image/png) 2020-08-28 21:39 UTC, Dennis Roczek	Details
Problematic File (21.64 KB, application/rtf) 2020-08-28 21:39 UTC, Dennis Roczek	Details
somewhat fixed file using an old LibreOffice version (8.15 KB, application/rtf) 2020-08-28 21:40 UTC, Dennis Roczek	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dennis Roczek 2020-08-28 21:36:08 UTC

Description:
This Bug Report was reported to de-discuss and can be confirmed.

Version: 6.4.4.2 (x64)
Build-ID: 3d775be2011f3886db32dfd395a6a6d1ca2630ff
CPU-Threads: 4; BS: Windows 10.0 Build 18363; UI-Render: GL; VCL: win; 
Gebietsschema: de-DE (de_DE); UI-Sprache: de-DE
Calc: threaded

MMS Bila 5.0 is a small ERP system in Germany. You can grab a test version at http://www.mmsgmbh.de/finanzbuchhaltung.html .

Per mailing list this export is correct in LibreOffice before 6.4.0, although I cannot confirm that at the moment using 5.4.7 (portableapps version).

For what it is worth: Wordpad in Windows 10 (see another screenshot) and Abiword 2.8.6 (see screenshot) do also have their problems reading that file, only MSO (365 version) is able to read it correctly.

So basically: the content is completely messed up as the table columns are not in the correct order. 

Steps to Reproduce:
1. open attached RTF file


Actual Results:
messed up table

Expected Results:
correct table


Reproducible: Always


User Profile Reset: Yes



Additional Info:
.

Comment 1 Dennis Roczek 2020-08-28 21:37:24 UTC

Created attachment 164829 [details]
Screenshot of the table in LO64

Comment 2 Dennis Roczek 2020-08-28 21:37:44 UTC

Created attachment 164830 [details]
Screesnhot in LO5

Comment 3 Dennis Roczek 2020-08-28 21:38:00 UTC

Created attachment 164831 [details]
Abiword

Comment 4 Dennis Roczek 2020-08-28 21:38:33 UTC

Created attachment 164832 [details]
Wordpad in Windows 10 (1909)

Comment 5 Dennis Roczek 2020-08-28 21:39:03 UTC

Created attachment 164833 [details]
Correct rendering in MSO Word

Comment 6 Dennis Roczek 2020-08-28 21:39:29 UTC

Created attachment 164834 [details]
Problematic File

Comment 7 Dennis Roczek 2020-08-28 21:40:35 UTC

Created attachment 164835 [details]
somewhat fixed file using an old LibreOffice version

Comment 8 Telesto 2020-08-29 12:55:03 UTC

Confirm with
7.1

Also in
LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4

Comment 9 Telesto 2020-08-29 12:58:30 UTC

@Miklos
Is there some way to validate RTF files; the bug rtf doc here is dubious quality wise.

Comment 10 Miklos Vajna 2020-08-31 08:14:40 UTC

I'm not aware of anything like that. If Word opens the file, we're expected to do the same.

Comment 11 Telesto 2020-08-31 08:19:30 UTC

(In reply to Miklos Vajna from comment #10)
> I'm not aware of anything like that. If Word opens the file, we're expected
> to do the same.

For the record: The file can be opened.. point is more how everything is presented on screen

Comment 12 Dennis Roczek 2020-09-06 21:20:41 UTC

Oooh, I just realize: the problem is not the content itself, it is tab character!

If it is replaced by a whitespace using search and replace it is /mostly/ correctly displayed.

So basically it is in the file itself: \u8198\'20 which reads in the latest RTF spec 1.9.1 as following:

------------------------------------
\uN This keyword represents a single Unicode character that has no equivalent ANSI representation
based on the current ANSI code page. N represents the Unicode character value expressed as a
decimal number.
This keyword is followed immediately by equivalent character(s) in ANSI representation. In this
way, old readers will ignore the \uN keyword and pick up the ANSI representation properly.
When this keyword is encountered, the reader should ignore the next N' characters, where N'
corresponds to the last \ucN' value encountered.
As with all RTF keywords, a keyword-terminating space may be present (before the ANSI
characters) that is not counted in the characters to skip. While this is not likely to occur (or
recommended), a \binN keyword, its argument, and the binary data that follows are considered
one character for skipping purposes. If an RTF scope delimiter character (that is, an opening or
closing brace) is encountered while scanning skippable data, the skippable data is considered to
end before the delimiter. This makes it possible for a reader to perform some rudimentary error
recovery. To include an RTF delimiter in skippable data, it must be represented using the
appropriate control symbol (that is, escaped with a backslash,) as in plain text. Any RTF control
word or symbol is considered a single character for the purposes of counting skippable characters.

An RTF writer, when it encounters a Unicode character with no corresponding ANSI character,
should output \uN followed by the best ANSI representation it can manage. Often a question
mark is used if no reasonable ANSI character exists. In addition, if the Unicode character
translates into an ANSI character stream with a count of bytes differing from the current Unicode
Character Byte Count, it should emit the appropriate \ucN keyword prior to the \uN keyword to
notify the reader of the change.
Most RTF control words accept signed 16-bit numbers as arguments. For these control words,
Unicode values greater than 32767 are expressed as negative numbers. For example, the
character code U+F020 is given by \u-4064. To get -4064, convert F02016 to decimal (61472)
and subtract 65536.
Occasionally Word writes SYMBOL_CHARSET (nonUnicode) characters in the range
U+F020..U+F0FF instead of U+0020..U+00FF. Internally Word uses the values U+F020..U+F0FF
for these characters so that plain-text searches don’t mistakenly match SYMBOL_CHARSET
characters when searching for Unicode characters in the range U+0020..U+00FF. To find out the
correct symbol font to use, e.g., Wingdings, Symbol, etc., find the last SYMBOL_CHARSET font
control word \fN used, look up font N in the font table and find the face name. The charset is
specified by the \fch

------------------------------------

So as LibreOffice /seems/ not to identify \u8198 it should only display a whitespace.

I guess 8198 is that character https://www.codetable.net/decimal/8198 ("Six-Per-Em Space", but isn't this \u2006?!?).

So why do we not recognize that character? *g*

Comment 13 Dennis Roczek 2020-09-06 21:35:59 UTC

I forgot to mention: replacing the Unicode character + whitespace using search and replace (to whitespace) it looks nearly as good as in MSO Word!

Next question which come to my min: why is that six-per-em space not recognized as character separator? (yeah, I do know: another ticket!)

Comment 14 QA Administrators 2024-09-05 03:17:14 UTC Comment hidden (obsolete)

Dear Dennis Roczek,

To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.

There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.

If you have time, please do the following:

Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/

If the bug is present, please leave a comment that includes the information from Help - About LibreOffice.

If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice.

Please DO NOT

Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not
appropriate in this case)

If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/

2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to 'inherited from OOo';
4b. If the bug was not present in 3.3 - add 'regression' to keyword

Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa

Thank you for helping us make LibreOffice even better for everyone!

Warm Regards,
QA Team

MassPing-UntouchedBug

Comment 15 Dennis Roczek 2024-09-06 16:15:21 UTC

still broken / repro with

Version: 24.2.5.2 (X86_64) / LibreOffice Community
Build ID: bffef4ea93e59bebbeaf7f431bb02b1a39ee8a59
CPU threads: 4; OS: macOS 11.7.10; UI render: Skia/Raster; VCL: osx
Locale: de-DE (de_DE.UTF-8); UI: de-DE
Calc: threaded