Bug 118405 - Export as PDF shows gaps in Greek words when using TexGyre fonts
Summary: Export as PDF shows gaps in Greek words when using TexGyre fonts
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
6.0.3.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:pdf
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2018-06-27 06:45 UTC by JesseSteele
Modified: 2022-10-02 09:46 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
PDF from LO 7.4 (69.81 KB, application/pdf)
2022-09-13 11:33 UTC, ⁨خالد حسني⁩
Details

Note You need to log in before you can comment on or make changes to this bug.
Description JesseSteele 2018-06-27 06:45:19 UTC
Description:
Some fonts in other languages don't render correctly in pdf.

I created a repo with a fully-reproduced problem, example files, and in-depth description here:
https://github.com/JesseSteele/pdf-bug

Steps to Reproduce:
1. Open "Problem example.odt" in Writer
2. Click the icon "Export as PDF"
3. Look in the file and see problems with Greek word spaces on pages 3-4

Actual Results:
Greek words have unexpected spaces using the TexGyre Pagella font, but not the standard Roman font. The unexpected spaces even push letters OUTSIDE the margins, but the words should at least wrap. Seems double trouble to me.

Expected Results:
Greek words should not have unexpected spaces.


Reproducible: Always


User Profile Reset: No



Additional Info:
Exporting this to PDF in Calligra Words (using 'custom size' paper) does not have this problem. Perhaps Calligra is on to something and "doing it correctly". (This is how I solved the problem and published on Amazon print on demand.)

But, as the repo explains, Calligra messes up pages when exporting from .doc. LibreOffice produces identical results in Export to PDF, from both .odt and .doc, thanks for the consistency guys, really!

I also had the same problem using lowriter in the terminal.
Comment 1 Julien Nabet 2018-06-27 09:30:50 UTC
On which env are you?

Could you give a try to last stable LO version 6.0.5?

If it doesn't work, it could be interesting, just for the test, to know if the bug is still reproduceable on a daily build from master branch (see https://dev-builds.libreoffice.org/daily/master/Win-x86_64@42/current/).
Comment 2 JesseSteele 2018-06-27 16:26:23 UTC
Yo, Julien,

I'm cool, so I'm using Ubuntu 18.04.

I added your repo and tried 6.0.5. Same problem.

I hacked your W!nd@w$ link and the .deb package didn't install.

But, just clone or DL the repo and try it yourself, either the .doc or .odt files and see if you get the spaces after the Greek letters that look like "a" or "w"...

WRONG:
πολλῶ ν

ὑδά των
OR
ὑ δά των

RIGHT:
πολλῶν

ὑδάτων

git clone http://github.com/jessesteele/pdf-bug
Comment 3 JesseSteele 2018-06-27 16:27:56 UTC
You may need TexGyre installed to use real fonts for real publishing...

sudo apt install tex-gyre
Comment 4 Julien Nabet 2018-06-27 16:44:58 UTC
On pc Debian x86-64 with master sources updated today, I git cloned your repo.

I noticed that LO pdf showed pairs of words in columns, on Calligra pdf it's not the case.
Then I opened the odt, I saw these same colums.
I exported the file on pdf, same columns too.

I didn't see:
πολλῶ ν
ὑδά των
OR
ὑ δά των

I suppose I missed something but don't know what.
Comment 5 JesseSteele 2018-06-27 17:27:48 UTC
The "columns" are a product of "justify text". In Pagella, each line ends with a different word.

Look on PDF page 9 (labeled page 1)

You will see the little "v" thing by itself. It's not supposed to be. It's been pushed out into the margin where it shouldn't be able to be.

The u da twv should be all one word.

Looking down at PDF pages 10-11 (labeled 2-3) the Greek letters are grouped as all one word as they should be.

You can see this in the file I put in the GitHub repo.
Comment 6 Julien Nabet 2018-06-27 17:58:16 UTC
On "Problem example DOC - via LibreOffice.pdf", 9th page of pdf, I see:
hudatoen/ ὑ δάάτων

whereas in "Problem example DOC - via Calligra.pdf", I see:
hudatoen/ὑδάτων

Ok so Calligra seems ok, not LO.

Strangely, I've opened "Problem example.doc" with 6.0.5.2 LO Debian (testing) package and exported it, I got:
hudatoen/ ὑ δάτων (from copy paste)
but I see:
hudatoen/ὑδάτων
as if the copy paste would add some spaces.

I noticed on Evince that when highlighting "ὑδάτων", "ά" was replaced by a square only on LO PDF export.

In brief, I don't reproduce exactly what you describe but I got a pb too.

Miklos: thought you might be interested in this one since it concerns PDF export unless it's more about fonts rendering, in that case, Khaled may help here?
Comment 7 JesseSteele 2018-06-27 18:13:16 UTC
Yeah! Julien, you are seeing it, and the other problems.

I can explain. I studied Greek in college and I get the font...

Those accent mark things above and under the letters are rendered by fonts similarly to how "ff" might be a single character. I don't know base-language level font encoding, but UTF-8 might regard ὑ or ά as actually two separate characters, like combo letters in creating fonts.

Dealing with that "correctly" is probably where the problem begins.

Summary:

TexGyre might not do their fonts "correctly", but that Roman default font does.

Calligra got TexGyre to render correctly... BUT, Amazon's print on demand says the Calligra .pdf file is broken...

(Calligra rant: in Calligra, .doc > .pdf, it messed up the page numbers; .odt > .pdf Calligra was great, but Amazon said it was broken or someth. So, not even Calligra is perfect in this.)

That's what I know.

Cheers and kudos all!
Comment 8 JesseSteele 2018-07-05 18:05:03 UTC
For what it's worth, I've had problems with ghostscript processing .pdf files to get CMYK working correctly. Scribus won't import the same files it could before. Command line gs makes a .pdf blank. These don't work:

http://wiki.inkscape.org/wiki/index.php/ExportPDFCMYK
http://zeroset.mnim.org/2014/07/14/save-a-pdf-to-cmyk-with-inkscape/

...Just in case you're using ghostscript and it's giving everyone problems. :-)
Comment 9 Buovjaga 2018-07-15 16:25:59 UTC
NEW as Julien's confirmation was confirmed.
Comment 10 QA Administrators 2019-12-10 04:00:09 UTC Comment hidden (obsolete)
Comment 11 QA Administrators 2021-12-10 04:22:39 UTC Comment hidden (obsolete)
Comment 12 ⁨خالد حسني⁩ 2022-09-13 11:33:35 UTC
Created attachment 182405 [details]
PDF from LO 7.4

I can’t reproduce this. The accented Greek letters use a fallback font (which mean TeX Gyre fonts do not support them), but the spacing looks fine in exported PDF.