Created attachment 121931 [details]
its a persian translation islamic doa in pdf format.
hi every one.
persian pdf files can not be show correctly in libreoffice and becomes completely unreadible.
steps to reproduce:
1/ open libreoffice writer.
2/ press control o or select open in the file menu.
3/ for the type of file that you want to open, select PDF - Portable Document Format (Writer) (*.pdf)
4/ choose utf-8 for encoding.
5/ dont change anything after that and press okay to open the file.
current result: libreoffice does not show persian pdf documents and after opening, the files unreadible.
also open these files in adobe reader and sumatra pdf reader to see the difference.
expected behaviour: libreoffice shows persian pdf documents like word and html documents and show them correctly.
LibreOffice Writer is _not_ a PDF reader. In very simple documents containing text only it can open them in a readable way.
If TDF wants LibreOffice to accurately open PDF files in Writer then this can be considered a bug.
the libreoffice developers included pdf documents in libreoffice supported format.
you can open file and select type of file pdf writer documents.
libreoffice can support english pdf documents and shows them correctly.
the only issue is persian and maybe arabic languages that are right to left languages.
(In reply to zahra from comment #2)
> the libreoffice developers included pdf documents in libreoffice supported
> you can open file and select type of file pdf writer documents.
Yes, I am aware of the option. But does TDF expect it to be a perfect import?
> libreoffice can support english pdf documents and shows them correctly.
Yes, because English is a simple language without any special characters.
> the only issue is persian and maybe arabic languages that are right to left
That is NOT true. There are also issues with Portuguese diacritics and the same problems probably occur on many non-arabic languages. You can NOT assume that it is exclusive of RTL languages.
** Please read this message in its entirety before responding **
To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.
There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.
If you have time, please do the following:
Test to see if the bug is still present on a currently supported version of LibreOffice
(5.2.5 or 5.3.0 https://www.libreoffice.org/download/
If the bug is present, please leave a comment that includes the version of LibreOffice and
your operating system, and any changes you see in the bug behavior
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave
a short comment that includes your version of LibreOffice and Operating System
Please DO NOT
Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not
appropriate in this case)
If you want to do more to help you can test to see if your issue is a REGRESSION. To do so:
1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3)
2. Test your bug
3. Leave a comment with your results.
4a. If the bug was present with 3.3 - set version to "inherited from OOo";
4b. If the bug was not present in 3.3 - add "regression" to keyword
Feel free to come ask questions or to say hello in our QA chat: http://webchat.freenode.net/?channels=libreoffice-qa
Thank you for helping us make LibreOffice even better for everyone!
So the issue is that LO should be able to detect RTL characters and order them correctly, but presently it is putting them in reverse, and it should set RTL in the textboxes they are added in as well.
(In reply to zahra from comment #0)
> 4/ choose utf-8 for encoding.
Where did you find this option?
Created attachment 136974 [details]
sample odt with arabic, hebrew and persian
Created attachment 136975 [details]
1. open pdf with 'pdf (writer)' filter in file open dialog
2. notice that text is in reverse order for each of the 3 languages
3. enter into any of the 3 textboxes and they are set to LTR
Build ID: 3672cdd35985201ea87463cf032fedd02c052f4d
CPU threads: 2; OS: Linux 4.4; UI render: default; VCL: gtk2;
Locale: en-US (en_US.UTF-8); Calc: group
Same issue also happens when importing the pdf in Draw.
I don't see it is a bug.
RTL text layout depends on control characters and meta attributes to establish the fragments hierarchy and individual direction for each span.
PDF images contain only the pictures of the letters, and there is no simple, automatic way to put them into the correct reading order.
Maybe there should be a new RTL-specific editing operation implemented, "Visual <-> Logical conversion", which would allow fixing this manually.
This can be partially worked around but can’t 100% fixed since PDF does not store actual text (usually) but just the end result of text layout with many information critical to reproducing the original text completely lost.
Poppler has some support for this, the discussion and patches in https://bugs.freedesktop.org/show_bug.cgi?id=55977 might help someone trying to do the same in LibreOffice.
My 2¢, recreating text from PDF files is a lost cause, PDF is first and foremost a print file format, so it should be viewed as some glorified printed paper.
*** Bug 114189 has been marked as a duplicate of this bug. ***
I filed the recent dupe, so - something I noted there: This bug manifests in particular with PDFs created by LO itself (write something in Writer, export it to PDF, open it in Draw).
Replying to Comment 10:
> My 2¢, recreating text from PDF files is a lost cause, PDF is first and foremost a print file format, so it should be viewed as some glorified printed paper.
I disagree, with the exact opposite opinion:
* First, we have to distinguish between proper document recreation from PDF, which is more of a challenge, and recreation of text runs in frames which is what Draw does.
* There's no good reason that what Draw does for LTR text should not succeed for RTL text - especially when most PDF readers succeed in this already, and even let you copy-and-paste the raw RTL text correctly (in most cases).
* People very often get PDF documents and need to alter them despite not having the original. Example: A form you need to fill for some official agency like the government or a bank etc. That's an important use case that needs to be catered to.
* People very often have use for the raw text in a PDF document when penning a reply - to quote some of the text back. So this is another important use case.
* Even if you have a piece of paper, you should be able to OCR it into a PDF and then get the text back... :-)
If you want editable documents don’t use PDF; it is a final format for consumption by human readers. Treat PDF as a glorified printed paper and you will be happy.
Draw is not a PDF editor, if you need one there are better options. If it were for me, I’d drop the PDF importer altogether and not give people false hopes. Inserted PDFs should be treated as vector images not documents to import text from.
Feel free to fix this bug though, and good luck with it (sincerely, I have worked with extracting RTL from PDF documents elsewhere before and I know how messed up things are)
(In reply to Khaled Hosny from comment #13)
So, the basis for your argument is the position that Draw should not offer importing PDFs. I respectfully disagree, but, regardless - it does import PDFs. Also, I understand that LO Writer has a PDF import filter too...
Now, the thing is, I understand it may be difficult - but it can't be that difficult if most of the PDF readers get it right. Right?
(In reply to Eyal Rozenberg from comment #14)
> Now, the thing is, I understand it may be difficult - but it can't be that
> difficult if most of the PDF readers get it right. Right?
If they did, but they don’t.
*** This bug has been marked as a duplicate of bug 89471 ***