Bug 151546 - PDF poppler based filter import to Writer canvas reverses RTL script
Summary: PDF poppler based filter import to Writer canvas reverses RTL script
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.1 all versions
Hardware: All All
: medium normal
Assignee: Kevin Suo
URL:
Whiteboard: target:7.5.0 target:7.4.3 target:7.4.4
Keywords: bisected, filter:pdf, regression
Depends on:
Blocks: PDF-Import-Writer RTL 104597
  Show dependency treegraph
 
Reported: 2022-10-15 17:23 UTC by V Stuart Foote
Modified: 2024-08-03 09:15 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
output_writerpdfimportfilter-jenkins.xml (15.76 KB, text/xml)
2022-10-23 11:51 UTC, Kevin Suo
Details
Some RTL and LTR text in various configurations (43.71 KB, application/pdf)
2022-10-25 19:36 UTC, Eyal Rozenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description V Stuart Foote 2022-10-15 17:23:10 UTC
PDF poppler filter import to Writer canvas mishandles RTL text order. 

Regression in handling RTL text runs of PDF filter import broke with https://cgit.freedesktop.org/libreoffice/core/commit/?id=ff140bb6b8b109f14c270ff059f0b8d71dab5d6c

RTL runs are now ordered correctly by the PDF Import filters [1] for Draw and Impress by refactoring of

https://git.libreoffice.org/core/commit/69e9925ded584113e52f84ef0ed7c224079fa061

The PDF Import filter for Writer [2] has not received similar refactoring and needs dev effort

Complicated bcz the Draw/Impress filter has had a fair amount of work that was not applied to the Writer import filter.

=-ref-=
[1] https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/tree/drawtreevisiting.cxx
[2] https://opengrok.libreoffice.org/xref/core/sdext/source/pdfimport/tree/writertreevisiting.cxx
Comment 1 V Stuart Foote 2022-10-15 18:00:33 UTC
@Kevin, *

Khaled had suggested in bug 89471 c#18 to look at the ICU ubidi library for a means to reverse the poppler delivered runs.

https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ubidi_8h.html#aeed24292bbed966df93f088bc6791f74

ubidi_setReorderingMode()
ubidi_writeReordered()
ubidi_writeReverse()
Comment 2 Eyal Rozenberg 2022-10-15 18:39:02 UTC Comment hidden (off-topic)
Comment 3 Eyal Rozenberg 2022-10-15 18:39:57 UTC Comment hidden (off-topic)
Comment 4 Commit Notification 2022-10-19 19:35:12 UTC
Kevin Suo committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f6004e1c457ddab5e0c91e6159875d25130b108a

tdf#151546: RTL text is reversed (Writer pdfimport)

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Kevin Suo 2022-10-20 00:32:01 UTC
This is now fixed on master branch. Could someone help to verify.
Comment 6 Eyal Rozenberg 2022-10-20 20:20:07 UTC
(In reply to Kevin Suo from comment #5)

Give us a while for it to make the dailies.
Comment 7 Kevin Suo 2022-10-23 11:51:33 UTC
Created attachment 183217 [details]
output_writerpdfimportfilter-jenkins.xml
Comment 8 Eyal Rozenberg 2022-10-25 19:36:47 UTC
Created attachment 183271 [details]
Some RTL and LTR text in various configurations

A PDF to try importing with some combinations of RTL text in several contexts.
Comment 9 Eyal Rozenberg 2022-10-25 19:40:10 UTC
RTL text runs are no longer reversed by the Writer import filter, AFAICT. So, verifying.

But here too there are lots of other issues, such as:

* Shifting of text frames
* Breakup of lines into multiple frames, including single-char frame with a punctuation mark at the end of lines
* text run frames overlap each other
* the wrong font being used

etc. Most of these have their own bug somewhere, but I should mention them to clarify that the Writer PDF import is absolutely not ready for prime-time where RTL scripts are concerned.
Comment 10 Commit Notification 2022-11-01 14:53:50 UTC
Kevin Suo committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/6b724fd0355f33226b8657110459968aa1be02ea

tdf#151546: RTL text is reversed (Writer pdfimport)

It will be available in 7.4.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2022-11-18 18:51:15 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/50d73574b6c3d71f9a539c895a15d6fcda22390b

Related tdf#104597, tdf#151546: Introduce comphelper::string::reverseCodePoints

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Commit Notification 2022-11-25 16:49:24 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/f1db364f294d2d9a40d77004aeeb36729ae1c4ca

Related tdf#104597, tdf#151546: Introduce comphelper::string::reverseCodePoints

It will be available in 7.4.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.