Bug 131386 - Hidden linebreaks are ignored on creating pdf from rtf document
Summary: Hidden linebreaks are ignored on creating pdf from rtf document
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
3.6.2.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.6.0 target:7.5.1
Keywords: filter:rtf
: 150475 152866 (view as bug list)
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2020-03-17 12:43 UTC by Jürgen Bosch
Modified: 2023-11-07 14:59 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Testfile in right to left for reproduce (83.80 KB, application/msword)
2020-03-17 12:56 UTC, Jürgen Bosch
Details
Converted result with wrong linebreaks in (15.05 KB, application/pdf)
2020-03-17 12:57 UTC, Jürgen Bosch
Details
Screenshots to demonstrate the problem (154.91 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-03-17 13:29 UTC, Jürgen Bosch
Details
The example file in Word and Writer master with hidden characters hidden (53.15 KB, image/png)
2022-01-31 20:00 UTC, Gabor Kelemen (allotropia)
Details
The example file in Word 2016 and Writer 7.6 (196.91 KB, image/png)
2023-01-23 11:05 UTC, Gabor Kelemen (allotropia)
Details
PDF export from 7.6 master (51.22 KB, application/pdf)
2023-01-23 11:07 UTC, Gabor Kelemen (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jürgen Bosch 2020-03-17 12:43:31 UTC
Description:
I have an rtf document with hidden text and also with single hidden linebreaks. On using the pdf writer to convert the rtf to pdf the single hidden linebreaks are not ignored and on the pdf the linebreak is visible which is wrong.

Steps to Reproduce:
1. Use the attached testPreprocessRtl.rtf
2. Convert to pdf: soffice --headless --convert-to pdf testPreprocessRtl.rtf


Actual Results:
the hidden linebreak in rtf is contained in the result of the converted pdf document

Expected Results:
the hidden linebreak in rtf should be not contained in the converted pdf as it is in the preview of the rtf document on hide paragraph marks and the hidden formating symbols.


Reproducible: Always


User Profile Reset: No



Additional Info:
In the example file an right to left rtf is attached where the issue could be reproduced.
Comment 1 Jürgen Bosch 2020-03-17 12:56:32 UTC
Created attachment 158747 [details]
Testfile in right to left for reproduce

The rtf contains 2 lines to preview the difference display the hidden characters on the rtf.
Comment 2 Jürgen Bosch 2020-03-17 12:57:44 UTC
Created attachment 158748 [details]
Converted result with wrong linebreaks in
Comment 3 Jürgen Bosch 2020-03-17 13:04:18 UTC
Comment on attachment 158747 [details]
Testfile in right to left for reproduce

The file contains 2 Lines:

the first line has an hidden linebreak before the colon => on converting to pdf the linebreak is wrongly rendered in addition the colon is on the left side instead on the right side as in the preview of the rtf

the second line has an hidden linebreak after the colon => on converting to pdf the linebreak is wrongly rendered but the colon is correctly rendered
Comment 4 Jürgen Bosch 2020-03-17 13:10:08 UTC
My usecase is to convert rtf files to pdf.

Therefore my environment is a docker container with alpine 3.11.
In this container it is installed:
- libreoffice-writer=6.3.2.2-r3
- font-noto
- font-noto-cjk

For the convertion i use:
soffice --headless --convert-to pdf filename.rtf
Comment 5 Jürgen Bosch 2020-03-17 13:29:56 UTC
Created attachment 158751 [details]
Screenshots to demonstrate the problem
Comment 6 Buovjaga 2020-06-19 12:07:57 UTC
It already looks "bad" when opened in LibreOffice, there is no need to convert to PDF.

In which software was the RTF created in?
Comment 7 Jürgen Bosch 2020-06-19 12:44:55 UTC
The rtf is created in MS Word.
As you see in the attached document "Screenshots to demonstrat..." the rtf in MS Word looks fine but only if the hidden characters are diabled otherwise the colon is at the begin of the next line and on converting to pdf the result pdf looks bad.
Comment 8 Buovjaga 2020-06-19 12:48:01 UTC
I confirmed in

Arch Linux 64-bit
Version: 7.1.0.0.alpha0+
Build ID: ad0351b84926075297fb74abbe9b31a0455782af
CPU threads: 8; OS: Linux 5.7; UI render: default; VCL: kf5
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 17 June 2020
Comment 9 Gabor Kelemen (allotropia) 2022-01-31 20:00:42 UTC
Created attachment 177946 [details]
The example file in Word and Writer master with hidden characters hidden

Still bad in

Version: 7.4.0.0.alpha0+ (x64) / LibreOffice Community
Build ID: eb69767d7c1bb8e6e780fd9503f08c9d7f5ecb45
CPU threads: 13; OS: Windows 10.0 Build 19042; UI render: default; VCL: win
Locale: hu-HU (hu_HU); UI: en-US
Calc: threaded
Comment 10 Commit Notification 2023-01-13 19:32:49 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/a25eda715591cfa96136bcfd95360156516239d1

tdf#131386 writerfilter: RTF import paragraph mark formatting

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2023-01-19 12:42:36 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/1ed691c4685baa6170829bc8fc464f4036806a92

tdf#131386 writerfilter: RTF import paragraph mark formatting

It will be available in 7.5.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Gabor Kelemen (allotropia) 2023-01-23 11:05:50 UTC
Created attachment 184842 [details]
The example file in Word 2016 and Writer 7.6

Verified in master:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: f1830bff71847a9c17715cff52383956719847fe
CPU threads: 14; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: de-DE (hu_HU); UI: en-US
Calc: threaded

Now only two lines appear in hide formatting marks view, also in PDF export.
Comment 13 Gabor Kelemen (allotropia) 2023-01-23 11:07:31 UTC
Created attachment 184843 [details]
PDF export from 7.6 master

For the record the PDF as well.

Note: there is still an RTL issue left: in the first line the : appears before the EHS word, not after like in the second line. 
Not a side effect of the above fix, this was pre-existing.
Comment 14 Buovjaga 2023-02-13 10:40:58 UTC
*** Bug 150475 has been marked as a duplicate of this bug. ***
Comment 15 Commit Notification 2023-03-02 10:21:05 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/9e365b05e7ca986f6ee4a4a58d0bb20947975864

tdf#131386 ApplyParagraphMarkFormatToNumbering remove RTF fallback

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Gabor Kelemen (allotropia) 2023-11-07 14:59:39 UTC
*** Bug 152866 has been marked as a duplicate of this bug. ***