When exporting ODF to PDF (in Writer or another LibreOffice component), the author can/should check several options in the export dialog to produce tagged and accessible PDF:
* the options "Tagged PDF" and "Export bookmarks" on the General tab,
* the option "Bookmarks and page" on the Initial View tab (because many users who can benefit from bookmarks don't know they exist).
It is a good thing (!) that these options remain checked after the first time, but it would be even better if they were already checked by default (i.e. right after installation): many documents are sufficiently accessible to benefit from tagged PDF export, but most authors don't know that tagged PDF exists or what it means.
Enabling tagged PDF by default won't turn inaccessible ODF files into accessible PDF, but it would make many PDF files much easier to navigate with a screen reader. In untagged PDF, screen reader shortcuts for navigating documents (h for heading, t for table, p for paragraph, etc) won't work.
http://blogs.adobe.com/acrolaw/2006/01/understanding_t_1/ explains what tagged PDF is and why it matters.
It's an Accessibility issue but i couldn't find how to flag that up.
Microsoft Office 2010's built-in PDF export produces tagged PDF. I have not found a way to turn this off.
[This is an automated message.]
This bug was filed before the changes to Bugzilla on 2011-10-16. Thus it
started right out as NEW without ever being explicitly confirmed. The bug is
changed to state NEEDINFO for this reason. To move this bug from NEEDINFO back
to NEW please check if the bug still persists with the 3.5.0 beta1 or beta2 prereleases.
Details on how to test the 3.5.0 beta1 can be found at:
more detail on this bulk operation: http://nabble.documentfoundation.org/RFC-Operation-Spamzilla-tp3607474p3607474.html
I confirm that this functionality request is still relevant to LibreOffice 3.5.0 RC1 (i.e. bookmarks are exported by default, but tags are not) and change the status from UNCONFIRMED to NEW.
** Please read this message in its entirety before responding **
To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year.
There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present.
If you have time, please do the following:
Test to see if the bug is still present on a currently supported version of LibreOffice (184.108.40.206 or later): https://www.libreoffice.org/download/
If the bug is present, please leave a comment that includes the version of LibreOffice and your operating system, and any changes you see in the bug behavior
If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a short comment that includes your version of LibreOffice and Operating System
Please DO NOT
Update the version field
Reply via email (please reply directly on the bug tracker)
Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case)
Thank you for your help!
-- The LibreOffice QA Team
This NEW Message was generated on: 2015-02-19
Tagged PDF option still not checked by default under v220.127.116.11 when exporting to PDF.
Removing comma from Whiteboard (please use a space to delimit values in this field)
Migrating Whiteboard tags to Keywords: (a11y -> accessibility)
So "export bookmarks" is already enabled, so 1 of 3 is already done. Enabling "bookmarks and page" seems fine as we have already enabled "export bookmarks" and pdfs with no bookmarks arent affected by this setting. "tagged pdf" is the main issue to discuss as the help states "Selects to write PDF tags. This can increase file size by huge amounts.", so i'm not to sure about enabling this by default.
@Heiko, @Stuart, @Cor, @Adolfo: What are your takes?
(In reply to Yousuf Philips (jay) from comment #9)
> So "export bookmarks" is already enabled, so 1 of 3 is already done.
> Enabling "bookmarks and page" seems fine as we have already enabled "export
> bookmarks" and pdfs with no bookmarks arent affected by this setting.
> "tagged pdf" is the main issue to discuss as the help states "Selects to
> write PDF tags. This can increase file size by huge amounts.", so i'm not to
> sure about enabling this by default.
> @Heiko, @Stuart, @Cor, @Adolfo: What are your takes?
Do some tests on file size..
But I agree with Christophe about the benefit. Think I would give that more weight..
While this is simply a toggle control in the export dialog, so should be simple to implement. Think it is correct to do as a default setting to produce tagged PDF by default.
More important but much more work would be to refactor the tagged PDF to produce PDF/UA compliant PDF (bug 45636).
So i tested a few docs i had from my compatibility testing days and here is the change in size.
747.4 KiB -> 3.9 MiB (+433%) 
979.5 KiB -> 1.5 MiB (+52%) 
515.0 KiB -> 680.3 KiB (+325) 
160.6 KiB -> 198.6 KiB  (+24)
184.8 KiB -> 385.7 KiB  (+109%)
405.7 KiB -> 484.6 KiB (+19%)  (a google doc i threw into the mix)
So in half of the cases, the size more than doubles and the other half it less than doubles, so it all comes down to how much structure the file has and whether users will complain about their pdfs growing way to much.
 attachment 103815 [details]
How about checking "Archive PDF" by default? This option includes the tagged feature and produces more standardized results. Otherwise I think average documents are small enough to deal with double its size. We do not talk about 100MB files.
"Create PDF Form' is enabled by default and enabling "Archive PDF" disables that option. Also when i tested exporting a file (number 6 from comment 12), it gave some warnings of loss of features and the file size jumped 50% compared to the tagged pdf version, so i wouldnt think it would be suitable as default for the masses.
(In reply to Heiko Tietze from comment #13)
> How about checking "Archive PDF" by default? This option includes the tagged
> feature and produces more standardized results. Otherwise I think average
> documents are small enough to deal with double its size. We do not talk
> about 100MB files.
I'm also in favor of enabling archive PDF by default because it's less problematic to increase size of PDF than creating inaccessible PDFs.
Also, Microsoft Office exports to accessible PDF by default.
@Heiko: Do you know who is aware on how to change a default settings in LibreOffice? I assume it's trivial for someone aware of this.
(In reply to Alex ARNAUD from comment #15)
> @Heiko: Do you know who is aware on how to change a default settings in
> LibreOffice? I assume it's trivial for someone aware of this.
The checkbox is on filter/uiconfig/ui/pdfgeneralpage.ui. It is set by const bool bIsPDFA = (pParent->mnPDFTypeSelection>=1) && (pParent->mnPDFTypeSelection <= 3); in filter/source/pdf/impdialog.cxx, which is defined in filter/source/pdf/pdfexport.hxx. The value is read in impdialog.cxx as mnPDFTypeSelection = maConfigItem.ReadInt32( "SelectPdfVersion", 0 ); and this configuration is defined in https://opengrok.libreoffice.org/xref/core/officecfg/registry/schema/org/openoffice/Office/Common.xcs?r=a927e096#5389 as 0. Just set it to one of the other values.
Samuel, Thorsten: What is a sane/save default for this?
Hi, I am working on this. Can you please point me where should I look for the code related to this issue?
(In reply to RISHAV from comment #17)
> Hi, I am working on this. Can you please point me where should I look for
> the code related to this issue?
The code pointer was in the last comment. Please always read some comments before asking.
@RISHAV, are you still working on this one?
(In reply to stragu from comment #19)
> @RISHAV, are you still working on this one?
As it's been 6 months, it's safe to assume the answer is "no"
I am interested in fixing this one.
Now, since this bug was reported, we have PDF/UA available as an option.
I understand that it would be best to now use PDF/UA as a default, as opposed to only a tagged PDF (given that, as far as I know, PDF/UA is a more recent standard, and its features is superset of tagged PDF).
For this setting, the relevant line is this one:
I am currently testing file size changes on a sample of 213 files from the test files from core, all the files in:
Please let me know if you think there are more appropriate files to test PDF export on.
Created attachment 173642 [details]
comparison stats between current PDF export settings vs PDF/UA
Using the 213 sample files, I get to the following stats on increase in PDF size:
1365.94% maximum change
104.88% minimum change
186.28% median change
258.87% mean change
The most important value here is the median change: most of the example files will result in a PDF/UA file that is less than twice the size of the current default.
The largest increases seem to be mostly related to tables and special fields.
Even if the increase is significant, I still think this is a wonderful improvement to make PDF accessible by default. It is also important for LO to be a credible tool in businesses and public institutions, especially since laws about accessibility are increasingly common. If users are concerned about PDF size, they still have the option to change settings to lower it.
Given that this bug report was originally about tagged PDF, wondering if anyone has an opinion on which option is best:
- Default PDF/UA, but when unticked: "Tagged PDF" is unticked.
- Default PDF/UA, but when unticked: "Tagged PDF" is still ticked.
I would go with the second option.
(In reply to stragu from comment #22)
See the open enhancement of bug 117428 to implement a PDF /ActualText structure for each word as iterated by ICU word bounds.
That enhanced PDF content tagging would significantly alter Tagged PDF and PDF/UA size--but potentially greatly improve fidelity of assistive technology rendering of all document content.
So if that by-word ActualText was to be implemented, it would automatically be integrated in both UA and Tagged PDF, and would increase the size further? Did I understand it right?
One more question: if the commit makes PDF export tests fail (because the files generated by the tests are obviously different with the new default settings), should the tests be modified in the same commit before submitting to Gerrit?
(In reply to stragu from comment #24)
> Thanks, Stuart.
> So if that by-word ActualText was to be implemented, it would automatically
> be integrated in both UA and Tagged PDF, and would increase the size
> further? Did I understand it right?
Yes that is my understanding.
> One more question: if the commit makes PDF export tests fail (because the
> files generated by the tests are obviously different with the new default
> settings), should the tests be modified in the same commit before submitting
> to Gerrit?
Probably also needed.
Created attachment 173711 [details]
Result of bug 117428 OP STR as pasted to Notepad++ UTF-8
Created attachment 174124 [details]
comparison stats between current PDF export settings vs tagged PDF and PDF/UA
Updated stats on 213 sample files, using both tagged and PDF/UA options. Median rate of size change is 1.17 for tagged PDF, and 1.86 for PDF/UA.
With PDF/UA as a default, most PDFs (according to this sample) wouldn't reach a doubling in size.
Created attachment 174125 [details]
histogram of change rates in PDF sizes, tagged vs PDF/UA
Visualisation of how sizes change for 213 sample files, with median value highlighted.
PDF/UA results are more variable compared to tagged PDF, but median stays below a doubling in size.
Created attachment 174126 [details]
R script to process and visualise file sizes
Just in case it is useful / for transparency, the R script that processed the data and created the histogram.