Bug 144050 - FILEOPEN on an RTF document replaces multiple spaces with other characters, losing layout justification
Summary: FILEOPEN on an RTF document replaces multiple spaces with other characters, l...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.4.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:rtf, regression
Depends on:
Blocks: RTF
  Show dependency treegraph
 
Reported: 2021-08-24 10:56 UTC by Bernard Moreton
Modified: 2024-01-05 21:29 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
A source rtf ready to be opened in LO (1.68 KB, text/plain)
2021-08-24 11:00 UTC, Bernard Moreton
Details
The RTF saved (as) after opening SRC.rtf (12.21 KB, application/rtf)
2021-08-24 11:01 UTC, Bernard Moreton
Details
Spaces correctly rendered by Wordpad (84.80 KB, image/png)
2024-01-04 10:39 UTC, Jonas Camillus Jeppesen
Details
OpenOffice rendering spaces correctly (63.84 KB, image/png)
2024-01-04 10:40 UTC, Jonas Camillus Jeppesen
Details
Wrong rendering by Libre Office (119.77 KB, image/png)
2024-01-04 14:08 UTC, Jonas Camillus Jeppesen
Details
WordPad-inspired example demonstrating the consecutive space issue. (623 bytes, application/rtf)
2024-01-04 21:58 UTC, Jonas Camillus Jeppesen
Details
Montage showing rendering with/without \generator control word (166.52 KB, image/png)
2024-01-04 21:59 UTC, Jonas Camillus Jeppesen
Details
Correct and incorrect rendering in Libre Office (with/without \*\generator or LO Writer before/after version 6.3) (176.89 KB, image/png)
2024-01-05 08:15 UTC, Jonas Camillus Jeppesen
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bernard Moreton 2021-08-24 10:56:17 UTC
Description:
When opening an RTF document that uses plain typewriter font text with layout, multiple spaces are replaced by a series of "\u8198\'3f \u8198\'3f ", with the result that justification is lost.

If the document is stripped back to plain text (FILE.txt) and opened in LO, it imports correctly, with spaces and justification preserved, though needing page and fontsize modification if the lines are long.

Steps to Reproduce:
1.Open any RTF document using typewriter font and spaces to justify columns
2.
3.

Actual Results:
Justification is lost, and "spaces" are clearly not all of same length

Expected Results:
Justification should be preserved, and spaces kept as normal space-characters


Reproducible: Always


User Profile Reset: No



Additional Info:
Save it as some other FILE_2.rtf, and the changes will be clear

Version: 7.1.5.2 / LibreOffice Community
Build ID: 10(Build:2)
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-GB
Ubuntu package version: 1:7.1.5~rc2-0ubuntu0.20.04.1~lo1
Calc: threaded
Comment 1 Bernard Moreton 2021-08-24 11:00:08 UTC
Created attachment 174514 [details]
A source rtf ready to be opened in LO
Comment 2 Bernard Moreton 2021-08-24 11:01:43 UTC
Created attachment 174515 [details]
The RTF saved (as) after opening SRC.rtf
Comment 3 Bernard Moreton 2021-08-27 10:55:50 UTC
The problem is in FILEOPEN, not SAVE - this can be seen in the uneven motion of the cursor within multiple spaces (2 short moves with left/right arrow keys, one longer one).

Isolated space characters are not affected.

Unnecessary fonts and styles are introduced, but that is irrelevant to this current issue.

OTOH, if I introduce into the source PDF a line after the fonttbl
{\*\generator LibreOffice/7.1.5.2$Linux_X86_64 LibreOffice_project/10$Build-2}
the rtf opens as it should.  

So there is a "cheat" solution!
Comment 4 Dieter 2021-09-11 04:20:06 UTC
I'm not an expert, but as far as I can see, attachment 174514 [details] is not a rtf-file. I can't open it directly from browser. So I have to save it first and in this case the suggested file format is "text". Whe I save attachment 174515 [details] the suggested file format is "Rich Text Format".

Please check
=> NEEDINFO
Comment 5 Bernard Moreton 2021-09-11 07:51:19 UTC
It opens in my browser (Pale Moon, FireFox derivative), but as a text file, showing all the RTF directives.  How should a browser open an RTF file? - I don't know!
But it *is* an rtf file, albeit with minimal directives, and LO opens it as such.
Comment 6 Dieter 2021-09-11 09:42:02 UTC
(In reply to Bernard Moreton from comment #5)
> It opens in my browser (Pale Moon, FireFox derivative), but as a text file,
> showing all the RTF directives.  How should a browser open an RTF file? - I
> don't know!
> But it *is* an rtf file, albeit with minimal directives, and LO opens it as
> such.

How did you create that rtf-file?
Comment 7 Bernard Moreton 2021-09-11 14:12:03 UTC
The uploaded example file is a pared-down extract from a much longer PDF report file, with most of the actual text replaced character-for-character, for obvious discretionary reasons.
The PDF was reduced to text using
pdftotext -layout $src        # $src being the PDF file

A standard RTF header block is then written, with the mandatory {\rtf1\ansi
followed by a brief FONTTBL, COLORTBL (probably redundant), and a single style in the STYLESHEET.
I now follow that with the 
{\*\generator LibreOffice/7.1.5.2$Linux_X86_64 LibreOffice_project/10$Build-2}
to stop the unwanted behaviour of appending the strange characters in multi-soace strings.
Then the lines defining the papersize, margins, and orientation for the document and the section (the latter again probably redundant),
and finally the "\pard\plain \s7" to start the body of the text.

The text is then copied from the text file, adding a "\line" at each line-end.

And finally the RTF ending is added, "}"

I'd upload the BASH executable, but the source RTF already uploaded shows the process more clearly than the BASH script could do!

I've been using this sort of method for many years for reporting from 4GL, whether simply to LO (and OOo before that), or using LO to create a PDF from the command line - though in 4GL reporting most of the formatting is done by defining tabs.

When processing pre-formatted text, however, especially from the output of PDFTOTEXT, multiple spaces are unavoidable;  but they should *never* be added to with strange characters as the LO FILEOPEN for RTF obviously does.
Comment 8 Buovjaga 2022-05-24 06:41:25 UTC
Bibisected with linux-64-6.4 to
https://git.libreoffice.org/core/commit/24b04db5a63b57a74e58a7616091437ad68548ac
tdf#123703 RTF import: fix length of space character sequence

Version: 7.4.0.0.alpha1+ (x64) / LibreOffice Community
Build ID: b6266207b55a7633dc82b02142215757512adfb7
CPU threads: 2; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: fi-FI (fi_FI); UI: en-US
Calc: threaded Jumbo
Comment 9 Jonas Camillus Jeppesen 2024-01-04 10:39:37 UTC
Created attachment 191750 [details]
Spaces correctly rendered by Wordpad

Wordpad rendering https://bugs.documentfoundation.org/attachment.cgi?id=174514
Comment 10 Jonas Camillus Jeppesen 2024-01-04 10:40:08 UTC
Created attachment 191751 [details]
OpenOffice rendering spaces correctly

OpenOffice rendering https://bugs.documentfoundation.org/attachment.cgi?id=174514
Comment 11 Jonas Camillus Jeppesen 2024-01-04 14:08:36 UTC
Created attachment 191761 [details]
Wrong rendering by Libre Office

https://bugs.documentfoundation.org/attachment.cgi?id=174514 rendered by Libre Office Writer 7.6 on Linux.
Comment 12 Jonas Camillus Jeppesen 2024-01-04 21:58:53 UTC
Created attachment 191766 [details]
WordPad-inspired example demonstrating the consecutive space issue.

WordPad-inspired example demonstrating the consecutive space issue. Line 3 is blank, replace it with {\*\generator anystring} and the spaces render correctly.
Comment 13 Jonas Camillus Jeppesen 2024-01-04 21:59:46 UTC
Created attachment 191767 [details]
Montage showing rendering with/without \generator control word

Montage showing the rendering of https://bugs.documentfoundation.org/attachment.cgi?id=191766 with and without the \generator control word. 

Highlighted in red are regions of interest.
Comment 14 Jonas Camillus Jeppesen 2024-01-04 23:10:04 UTC
I was about to open an bug report on what I now believe to be the same bug/issue as this.

Working on RTF-output for the Pygments project I see consecutive spaces rendered width variable width or maybe even replaced with a number of other characters (in Libre Office Write 7.6.x on Linux and Windows 10).

Both my own minimal example [1], as well as the originally attached RFT-example[2] render correctly in WordPad[3] on Windows or Apache OpenOffice[4] (Windows+Linux).

An interesting observation I made was that adding the following line to the RTF-files makes the spaces render correctly in Libre Office.


{\*\generator anystring} 

The `{\*\generator ...}`-line appears in RTF-files produced by WordPad. I can't however find the \generator control word described anywhere in the rtf-specification. `\*` instructs readers to ignore the control word (a destination) if they do not implement it. WordPad appears to use it to declare which version of Microsoft Rich Edit was used to generate the file (e.g. `{\*\generator Riched20 10.0.18362}`.

[5] shows how replacing the blank line 3 in [1] with `{\*\generator anystring}` cause Libre Office to render the spaces at equal width. [5] also show a selection of the first 4 characters in Libre Office, e.g. ` 2  ` (space 2 space space), which Libre Office detects as a selection of 6 characters.

As I understand Comment 8 by Buovjaga a commit has been identified which introduced the bug. That commit is dated August 2019 so I picked a version from 2018 (6.1.6.3) and that version (6.1.6.3) is free of this bug/issue. It renders the spaces correctly.


[1] https://bugs.documentfoundation.org/attachment.cgi?id=191766
[2] https://bugs.documentfoundation.org/attachment.cgi?id=174514
[3] https://bugs.documentfoundation.org/attachment.cgi?id=191750
[4] https://bugs.documentfoundation.org/attachment.cgi?id=191751
[5] https://bugs.documentfoundation.org/attachment.cgi?id=191767
Comment 15 Jonas Camillus Jeppesen 2024-01-05 08:15:48 UTC
Created attachment 191772 [details]
Correct and incorrect rendering in Libre Office (with/without \*\generator or LO Writer before/after version 6.3)

Correct and incorrect rendering in Libre Office (with/without \*\generator or LO Writer before/after version 6.3)