Bug 83108 - Not enough randomness when converting to PDF
Summary: Not enough randomness when converting to PDF
Status: RESOLVED INSUFFICIENTDATA
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
4.2.4.2 release
Hardware: Other Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords: needsDevAdvice
Depends on:
Blocks: PDF-Export
  Show dependency treegraph
 
Reported: 2014-08-26 17:48 UTC by hyper_ch
Modified: 2018-07-31 09:42 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description hyper_ch 2014-08-26 17:48:37 UTC
Problem description: 

LO seems to have a lack of randomness for glyphs when exporting to PDF.

I did create a few shell scripts that make PDF management simpler. In one of those scripts, stamp documents with numbers:

I can select a bunch of PDFs in Dolphin. Then I get prompted to enter a starting number. Then the shell script goes through each pdf, it splits the PDF up into single pages. Then it uses a .odt file and replaces in there placeholders for document number an page number. The page number of course increases on each page, while the document number increases from PDF to PDF.

That .odt gets then converted to a PDF page and is then put over each single page, creating a new pdf. In the end, all pages for one document get combined together again.

This is repeated through all documents and in the end, they are then also merged again.

However in the resulting PDF I noticed, that the numbering for the documents is sometimes wrong. With each font that I tried always at the same page. E.g. when I use Arial, then document 14 starts showing up as document 10 in the combined PDF, while as single PDF it's all fine.

I asked in the #ghostscript channel for help for what the problem could be. Chrisl explained it to me like this:


So, this is a regular problem that comes up with the PDFs exported from Libre/OpenOffice. When you use a font to display, for example, a string like "Page 33", it would be wasteful to embed the entire Arial font in the PDF just for that, so applications "subset" fonts, so the only include the descriptions for those seven glyphs. Space is a glyph, defined for each font. The exact way that the subsets are stored varies, so I'll go with a trivial example.....

Take the first glyph "P" - the letter P is ascii code 80. So it would also be wasteful to include 79 glyph "slots" just to get to index 80 for P. Applications will therefore use a custom encoding so that (massively simplifying things!) the glyph "P" will actually be in glyph slot 1 (slot zero is special). The "a" will be slot 2 etc.....

So, in the next document, the first thing printed in Arial is actually "This page is intentionally left blank" - the application would put the glyph "T" in index 0. So, when you send the two PDFs to Ghostcript's pdfwrite device, it gets the Arial font, with the "P" in index 0 for the first PDF, then it gets Arial for the second document - but it already has an instance of Arial defined, so uses that. It gets a reference to index 0, which is already occupied, and thus does not need populated.

[...]

Bear with me, I'm getting to why it's a LibreOffice problem :-)

Now, Adobe document a mechanism to prevent that kind of clash, which is  that such font subsets should have a unique, random six letter prefix added to the name. The problem is that LibreOffice always use the same seed to create these prefixes for each document. So, for example, the prefixes are always of the pattern AAAAAA+Arial, AAAAAB+Arial..... so they are unique within the *current* document but not sufficiently unique to give protection from this kind of clash.



So, from what I gather LO has too little randomness when creating PDFs and that can then lead to collissions.

Because of, Chrisl suggested, that I first convert to .gs and then to PDF. Ever since implementing this change ( https://github.com/sjau/pdfForts/commit/5dc4c86d741abe9fff21ff37c7539c0d749eadac#diff-02b2731ca50711cd69dc2155179b6710 ) it now works as it should.

So I assume chrisl is right and that's an LO PDF export problem.

Operating System: Ubuntu
Version: 4.2.4.2 release
Comment 1 tommy27 2014-10-04 13:01:53 UTC
try 4.2.6.2 or 4.3.2.2 and tell if issue persists
Comment 2 hyper_ch 2014-10-04 13:19:30 UTC
(In reply to tommy27 from comment #1)
> try 4.2.6.2 or 4.3.2.2 and tell if issue persists

Still same behaviour on

Version: 4.2.6.3
Build ID: 420m0(Build:3)

when removing that gs patch. Randomness still lacks.
Comment 3 Robinson Tryon (qubit) 2014-10-26 21:28:57 UTC
(In reply to hyper_ch from comment #0)
> I did create a few shell scripts that make PDF management simpler. In one of
> those scripts, stamp documents with numbers:
> ... 
> However in the resulting PDF I noticed, that the numbering for the documents
> is sometimes wrong. With each font that I tried always at the same page.
> E.g. when I use Arial, then document 14 starts showing up as document 10 in
> the combined PDF, while as single PDF it's all fine.

Hiya,
Are all of the shell scripts you use up in your github repo?

Please list the steps to reproduce the problem here in a comment; that will make it easier for us to confirm the bug.

(Please change the status back to UNCONFIRMED when you're done)

Thanks,
--R
Comment 4 hyper_ch 2014-10-26 21:32:50 UTC
In the github repo there's a lot of different scripts to manipulate PDFs. Just use the stampPDF one.

In Dolphin select a bunch of PDFs, then right-click to get context menu, select there Actions -> Stamp documents and created PDF

That's all there is there.
Comment 5 AaronPeterson 2014-10-26 23:45:42 UTC
I'm new to the project,  Spending a bit of time looking into this. Intriguing.  It appears that we should use the glyph numbers that are actually used from the font to generate the hash for the internal font identifier...

I know that solidworks has major issues when opening documents with the same name...

If there was a continual counter that never reset, there could be collisions with documents made in different installations...


The glyph usage would give us a beneficial collisions.

It sounds like you(the submitter) are more capable than me at looking up desired behavior, which might include copying behavior from GhostScript?
Comment 6 AaronPeterson 2014-10-29 05:07:58 UTC
Have you tried setting some of the options for the PDF? There is an option for tagged.  (I still haven't found the source code related to the PDF creation)

I have every reason to believe that this bug is real, but that it is a very low priority, since you have found a workaround, and it only shows up when interacting with other programs that other people are not likely to do..  Although as a document wrangler myself, fixing it in Libre makes the most sense.
Comment 7 hyper_ch 2014-10-29 05:35:08 UTC
Have you actually tried it out so that it could be set to confirmed?
Comment 8 João Paulo 2015-03-27 12:40:25 UTC
(In reply to hyper_ch from comment #7)
> Have you actually tried it out so that it could be set to confirmed?

Hi, I can confirm LibreOffice 4.4.1.2 has a ramdomness problem when generating font names: AAAAAA+Arial, AAAAAB+Arial etc.

Can't confirm the other steps because I am not so pro user as the bug reporter.
Comment 9 Robinson Tryon (qubit) 2015-04-17 21:02:45 UTC
(In reply to João Paulo from comment #8)
> Hi, I can confirm LibreOffice 4.4.1.2 has a ramdomness problem when
> generating font names: AAAAAA+Arial, AAAAAB+Arial etc.
> 

Okay, that "randomness" part has been confirmed, but...

(In reply to hyper_ch from comment #0)
> LO seems to have a lack of randomness for glyphs when exporting to PDF.
> ...
> ...So, when you send the two PDFs to Ghostcript's pdfwrite device,
> it gets the Arial font, with the "P" in index 0 for the first PDF, then it
> gets Arial for the second document - but it already has an instance of Arial
> defined, so uses that. It gets a reference to index 0, which is already
> occupied, and thus does not need populated.

It sounds like LibreOffice is generating legal PDF output, but Ghostscript can't handle two separate PDFs that reuse indexes/identifiers.

> Bear with me, I'm getting to why it's a LibreOffice problem :-)
> 
> Now, Adobe document a mechanism to prevent that kind of clash, which is 
> that such font subsets should have a unique, random six letter prefix 

I'm confused about "unique" AND "random". "unique" suggests reproducibility, and "random" suggests the opposite.

> added
> to the name. The problem is that LibreOffice always use the same seed to
> create these prefixes for each document. So, for example, the prefixes are
> always of the pattern AAAAAA+Arial, AAAAAB+Arial..... so they are unique
> within the *current* document but not sufficiently unique to give protection
> from this kind of clash.

A random 6-letter prefix only gives ~309 million possibilities. That's a pretty small space, and doesn't seem like it gives much "protection" for the millions of PDF documents that I assume are produced every day. What does Adobe/the specs say about resolving collisions?

Status -> NEEDINFO
Comment 10 QA Administrators 2015-12-27 20:31:23 UTC Comment hidden (obsolete)
Comment 11 hyper_ch 2015-12-28 00:17:27 UTC
Not really sure what additional information is needed.
Comment 12 tommy27 2015-12-28 07:05:41 UTC
(In reply to hyper_ch from comment #11)
> Not really sure what additional information is needed.

I think you should answer to this one.

(In reply to Robinson Tryon (qubit) from comment #9)
> 
> > ..... The problem is that LibreOffice always use the same seed to
> > create these prefixes for each document. So, for example, the prefixes are
> > always of the pattern AAAAAA+Arial, AAAAAB+Arial..... so they are unique
> > within the *current* document but not sufficiently unique to give protection
> > from this kind of clash.
> 
> A random 6-letter prefix only gives ~309 million possibilities. That's a
> pretty small space, and doesn't seem like it gives much "protection" for the
> millions of PDF documents that I assume are produced every day. What does
> Adobe/the specs say about resolving collisions?
> 
> Status -> NEEDINFO
Comment 13 hyper_ch 2015-12-28 23:24:04 UTC
Well, as you can see, that was given by the ghostscript guys on irc. I have no real idea what that means.
Comment 14 QA Administrators 2016-05-09 20:08:03 UTC Comment hidden (obsolete)
Comment 15 hyper_ch 2016-05-09 20:19:05 UTC
It's interesting why it is still set to unconfirmed.

I even provided the tools to recreate that bug. Also two others have also confirmed it. Not sure what other info is needed.
Comment 16 Robinson Tryon (qubit) 2016-10-21 19:23:44 UTC
(In reply to hyper_ch from comment #15)
> Not sure what other info is needed.

(In reply to hyper_ch from comment #0)
> ...
> Now, Adobe document a mechanism to prevent that kind of clash,

Reference? URL?

As I said before:

>... What does
> Adobe/the specs say about resolving collisions?

Let's talk about randomness:

(In reply to hyper_ch from comment #0)
> ... The problem is that LibreOffice always use the same seed to
> create these prefixes for each document
> ...
> So, from what I gather LO has too little randomness when creating PDFs and
> that can then lead to collissions.

You mention a seed, but are you sure that LO is using a (pseudo)random number generator? Maybe it's just iterating through AAAAAA, AAAAAAB, etc? In any case, we'll want to make sure that the resulting PDFs conform to the spec. Please provide link(s).
Thanks!
Comment 17 Buovjaga 2016-11-15 08:07:57 UTC
No replies in almost a month, so back to needinfo it goes.
Comment 18 hyper_ch 2016-11-15 08:25:36 UTC
(In reply to Robinson Tryon (qubit) from comment #16)

As you can see from my inital bug report, I have no ideas on how LO does things. Most of what I have written there was provided info to me by Chrisl in the according #ghostscript channel.

So I can't answer your question.

However, I have question for you: Did you even try the script and reproduce the bug? I have given you all information needed to reproduce that occurence.
Comment 19 Buovjaga 2016-11-15 08:30:03 UTC
(In reply to hyper_ch from comment #18)
> So I can't answer your question.

You can't give a link to the Adobe thing?
Comment 20 hyper_ch 2017-04-26 14:29:42 UTC
I don't know what that Adobe thing is. If you read again, it wasn't me who referenced it but Chrisl from the #ghostscript channel in irc.
Comment 21 Buovjaga 2017-04-27 11:13:41 UTC
I tried installing and using stampPDF, but I failed. I have KDE Plasma 5.9.5 (Arch Linux).
I installed with sudo ./install.sh symlink stampPDF

stampPDF does not appear in the Dolphin right-click context menu.
Comment 22 hyper_ch 2017-04-27 11:25:56 UTC
KDE5? Tool is not updated for KDE5 context menu paths yet...
Comment 23 Buovjaga 2017-04-28 04:47:32 UTC
(In reply to hyper_ch from comment #22)
> KDE5? Tool is not updated for KDE5 context menu paths yet...

Ok, let me know when you update it or provide a command line repro path.
Comment 24 tommy27 2017-05-06 15:39:42 UTC
(In reply to Buovjaga from comment #23)
> (In reply to hyper_ch from comment #22)
> > KDE5? Tool is not updated for KDE5 context menu paths yet...
> 
> Ok, let me know when you update it or provide a command line repro path.

back to NEEDINFO
Comment 25 QA Administrators 2017-12-04 12:46:20 UTC Comment hidden (obsolete)
Comment 26 QA Administrators 2018-01-02 10:19:04 UTC Comment hidden (obsolete)
Comment 27 hyper_ch 2018-01-02 10:24:55 UTC
Wtf?

All the info is there.

Code provided... just try it out.
Comment 28 Buovjaga 2018-01-02 10:46:33 UTC
(In reply to Buovjaga from comment #23)
> (In reply to hyper_ch from comment #22)
> > KDE5? Tool is not updated for KDE5 context menu paths yet...
> 
> Ok, let me know when you update it or provide a command line repro path.

WTF, the info is not there...
Comment 29 hyper_ch 2018-01-02 10:47:53 UTC
What info?

See when this was filed?

All that's needed now is to go to commit, install it and put .desktop files into kde5 location for the action menu
Comment 30 Buovjaga 2018-01-02 13:01:11 UTC
(In reply to hyper_ch from comment #29)
> What info?
> 
> See when this was filed?
> 
> All that's needed now is to go to commit, install it and put .desktop files
> into kde5 location for the action menu

You should have announced you added KDE5 support in August last year.
You still did not give a command line method to test this.

I tried it now with the context menu, I see something runs and immediately closes. Kate does not open. I have all the deps specified in reqCmds="unzip zip sed pdftk gs libreoffice unoconv kate".

So please tell me how do I need to run it on the command line.
Comment 31 QA Administrators 2018-07-03 14:15:24 UTC Comment hidden (obsolete)
Comment 32 QA Administrators 2018-07-31 09:42:29 UTC
Dear Bug Submitter,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INSUFFICIENTDATA due to inactivity and
a lack of information which is needed in order to accurately
reproduce and confirm the problem. We encourage you to retest
your bug against the latest release. If the issue is still
present in the latest stable release, we need the following
information (please ignore any that you've already provided):

a) Provide details of your system including your operating
   system and the latest version of LibreOffice that you have
   confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED
and we will attempt to reproduce the issue. Please do not:

a) respond via email 

b) update the version field in the bug or any of the other details
   on the top section of our bug tracker

Warm Regards,
QA Team

MassPing-NeedInfo-20180731