Bug 153968 - FILEOPEN PDF: Parentheses direction swapped importing RTL text from WRITER PDF export
Summary: FILEOPEN PDF: Parentheses direction swapped importing RTL text from WRITER PD...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Draw (show other bugs)
Version:
(earliest affected)
4.0.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: PDF-Import-Draw RTL
  Show dependency treegraph
 
Reported: 2023-03-04 19:03 UTC by Eyal Rozenberg
Modified: 2024-08-03 09:15 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
PDF with RTL text and parentheses (5.63 KB, application/pdf)
2023-03-04 19:03 UTC, Eyal Rozenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2023-03-04 19:03:59 UTC
Created attachment 185752 [details]
PDF with RTL text and parentheses

Consider the attached PDF document, produced by typing in 

אחת (שתיים) שלוש

in Writer, and exporting to PDF.

If we open this PDF in LO Draw, we get a text box with: 

אחת )שתיים( שלוש

i.e. with the parenthesis direction flipped.

Note: (In Writer, only an empty document is imported for some reason.)
Comment 1 Eyal Rozenberg 2023-03-04 19:25:06 UTC
(In reply to Eyal Rozenberg from comment #0)
> Note: (In Writer, only an empty document is imported for some reason.)

So, that happens only in a 7.6 nightly. with a 7.5 release, Writer behaves the same as Draw: Flips the parenthesis directions.

Version: 7.5.0.3 (X86_64) / LibreOffice Community
Build ID: c21113d003cd3efa8c53188764377a8272d9d6de
CPU threads: 4; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US
Comment 2 Rainer Bielefeld Retired 2023-03-05 10:45:22 UTC
Already REPRODUCIBLE with Server Installation of Version:  4.0.0.3 WIN10
Build-ID  7545bee9c2a0782548772a21bc84a9dcc583b89;  Special devUserProfile

Additional Info:
a) Reporer's sample.pdf opens OK with FF Nightly, File Viewer, FoxIt, 
   FreePDF, EDGE
b) probably independent from WRITER PDF export. I created a PDF from a 
   WRITER document via ZAMZAR, same Parentheses direction swapped problem 
   when open in DRAW
c) Also Curly Brackets, Square Brackets affected
d) I did not see the problem in a PDF export from
   https://he.wikipedia.org/wiki/%D7%AA%D7%91%D7%A0%D7%99%D7%AA:%D7%97%D7%93%D7%A9%D7%95%D7%AA_%D7%95%D7%90%D7%A7%D7%98%D7%95%D7%90%D7%9C%D7%99%D7%94
   opened in DRAW
e) This query <https://bugs.documentfoundation.org/buglist.cgi?cmdtype=dorem&remaction=run&namedcmd=DUPs153968&sharer_id=19321> might show related Bugs.

@reporter:
Do you also see the problem with RTL PDF from other sources?
Comment 3 Eyal Rozenberg 2023-03-05 18:16:05 UTC
(In reply to Rainer Bielefeld Retired from comment #2)

> @reporter:
> Do you also see the problem with RTL PDF from other sources?

Microsoft Word for example. If you're asking about a specific source, we can try that source.
Comment 4 Rainer Bielefeld Retired 2023-03-06 12:33:00 UTC
I would like to understand why the Wikipedia PDF Parentheses do not show the problem.
Comment 5 Eyal Rozenberg 2023-03-06 13:08:14 UTC
(In reply to Rainer Bielefeld Retired from comment #4)
> I would like to understand why the Wikipedia PDF Parentheses do not show the
> problem.

I just printed-to-PDF from Firefox, at that URL. When opening in Draw, I noticed that some/all of the Hebrew parentheses are actually in their own separate one-character text box. That may have something to do with why they weren't flipped.
Comment 6 Rainer Bielefeld Retired 2023-03-06 14:06:00 UTC
(In reply to Eyal Rozenberg from comment #5)
> That may have something to do

good shot, sounds plausible. And I observe the same separate text boxes in the PDF directly exported from Wikipedia ("Document created by Skia/PDF m110", what ever that might mean)

f) so we might have 2 Problems here
f1) Writer PDF export does something different to other PDF creators, so that
    DRAW PDF import does not recognize how to handle the parenthesis in the 
    RTL text                                                              😥

                              but also:

f2) DRAW PDF import wrongly swaps parentheses in PDF export from Writer,
   although parenthesis look correctly in original PDF                    😥

It might be useful / necessary  to create a second BUG for (f1); I did not find any possibly related WRITER bug.
Comment 7 Eyal Rozenberg 2023-03-06 14:25:20 UTC
(In reply to Rainer Bielefeld Retired from comment #6)
> f) so we might have 2 Problems here

Not necessarily...

> f1) Writer PDF export does something different to other PDF creators, so that
>     DRAW PDF import does not recognize how to handle the parenthesis in the 
>     RTL text                                                              😥

I get the same behavior with an MS-Word-generated PDF. Also - it's not a good thing to break up the Hebrew text into per-character runs. So Writer probably does a good thing here.

>                               but also:
> 
> f2) DRAW PDF import wrongly swaps parentheses in PDF export from Writer,
>    although parenthesis look correctly in original PDF                    😥

Only this is the problem, I would say.