Bug 39667 - Enable accessible/tagged PDF export options by default
Summary: Enable accessible/tagged PDF export options by default
Status: ASSIGNED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
3.3.3 release
Hardware: Other All
: medium enhancement
Assignee: stragu
URL:
Whiteboard:
Keywords: accessibility, difficultyBeginner, easyHack, filter:pdf, skillDesign, topicUI
Depends on:
Blocks: a11y PDF-Export PDF-Accessibility
  Show dependency treegraph
 
Reported: 2011-07-29 08:37 UTC by Christophe Strobbe
Modified: 2021-08-07 11:27 UTC (History)
12 users (show)

See Also:
Crash report or crash signature:


Attachments
comparison stats between current PDF export settings vs PDF/UA (40.23 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-07-17 00:17 UTC, stragu
Details
Result of bug 117428 OP STR as pasted to Notepad++ UTF-8 (31.96 KB, image/png)
2021-07-20 15:09 UTC, V Stuart Foote
Details
comparison stats between current PDF export settings vs tagged PDF and PDF/UA (45.01 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-08-07 11:22 UTC, stragu
Details
histogram of change rates in PDF sizes, tagged vs PDF/UA (24.73 KB, image/svg+xml)
2021-08-07 11:26 UTC, stragu
Details
R script to process and visualise file sizes (2.35 KB, text/x-r-source)
2021-08-07 11:27 UTC, stragu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Strobbe 2011-07-29 08:37:30 UTC
When exporting ODF to PDF (in Writer or another LibreOffice component), the author can/should check several options in the export dialog to produce tagged and accessible PDF:
* the options "Tagged PDF" and "Export bookmarks" on the General tab,
* the option "Bookmarks and page" on the Initial View tab (because many users who can benefit from bookmarks don't know they exist).

It is a good thing (!) that these options remain checked after the first time, but it would be even better if they were already checked by default (i.e. right after installation): many documents are sufficiently accessible to benefit from tagged PDF export, but most authors don't know that tagged PDF exists or what it means.
Enabling tagged PDF by default won't turn inaccessible ODF files into accessible PDF, but it would make many PDF files much easier to navigate with a screen reader. In untagged PDF, screen reader shortcuts for navigating documents (h for heading, t for table, p for paragraph, etc) won't work.

Background:
http://blogs.adobe.com/acrolaw/2006/01/understanding_t_1/ explains what tagged PDF is and why it matters.
Comment 1 Tom 2011-07-30 06:28:25 UTC
Hi :)
It's an Accessibility issue but i couldn't find how to flag that up.
Regards from
Tom :)
Comment 2 Christophe Strobbe 2011-08-18 08:35:48 UTC
Microsoft Office 2010's built-in PDF export produces tagged PDF. I have not found a way to turn this off.
Comment 3 Björn Michaelsen 2011-12-23 12:21:31 UTC Comment hidden (obsolete)
Comment 4 Christophe Strobbe 2012-01-27 03:23:50 UTC
I confirm that this functionality request is still relevant to LibreOffice 3.5.0 RC1 (i.e. bookmarks are exported by default, but tags are not) and change the status from UNCONFIRMED to NEW.
Comment 5 QA Administrators 2015-02-19 15:43:51 UTC Comment hidden (obsolete)
Comment 6 Owen Genat (retired) 2015-03-22 12:42:02 UTC
Tagged PDF option still not checked by default under v4.4.1.2 when exporting to PDF.
Comment 7 Robinson Tryon (qubit) 2015-03-31 13:55:19 UTC Comment hidden (obsolete)
Comment 8 Robinson Tryon (qubit) 2015-12-10 03:37:39 UTC Comment hidden (obsolete)
Comment 9 Yousuf Philips (jay) (retired) 2016-09-13 10:08:26 UTC
So "export bookmarks" is already enabled, so 1 of 3 is already done. Enabling "bookmarks and page" seems fine as we have already enabled "export bookmarks" and pdfs with no bookmarks arent affected by this setting. "tagged pdf" is the main issue to discuss as the help states "Selects to write PDF tags. This can increase file size by huge amounts.", so i'm not to sure about enabling this by default.

@Heiko, @Stuart, @Cor, @Adolfo: What are your takes?
Comment 10 Cor Nouws 2016-09-13 19:40:23 UTC
(In reply to Yousuf Philips (jay) from comment #9)
> So "export bookmarks" is already enabled, so 1 of 3 is already done.
> Enabling "bookmarks and page" seems fine as we have already enabled "export
> bookmarks" and pdfs with no bookmarks arent affected by this setting.
> "tagged pdf" is the main issue to discuss as the help states "Selects to
> write PDF tags. This can increase file size by huge amounts.", so i'm not to
> sure about enabling this by default.
> 
> @Heiko, @Stuart, @Cor, @Adolfo: What are your takes?

Do some tests on file size..
But I agree with Christophe about the benefit. Think I would give that more weight..
Comment 11 V Stuart Foote 2016-09-13 20:18:08 UTC
While this is simply a toggle control in the export dialog, so should be simple to implement. Think it is correct to do as a default setting to produce tagged PDF by default.

More important but much more work would be to refactor the tagged PDF to produce PDF/UA compliant PDF (bug 45636).
Comment 12 Yousuf Philips (jay) (retired) 2016-09-14 00:40:09 UTC
So i tested a few docs i had from my compatibility testing days and here is the change in size.

747.4 KiB -> 3.9 MiB (+433%) [1]
979.5 KiB -> 1.5 MiB (+52%) [2]
515.0 KiB -> 680.3 KiB (+325) [3]
160.6 KiB -> 198.6 KiB [4] (+24)
184.8 KiB -> 385.7 KiB [5] (+109%)
405.7 KiB -> 484.6 KiB (+19%) [6] (a google doc i threw into the mix)

So in half of the cases, the size more than doubles and the other half it less than doubles, so it all comes down to how much structure the file has and whether users will complain about their pdfs growing way to much.

[1] http://www.microsoft.com/investor/reports/ar13/docs/2013_Annual_Report.docx
[2] attachment 103815 [details]
[3] http://download.microsoft.com/documents/rus/microsoft4you/How_to_license_the_operating_system_Windows_8_new.docx
[4] http://download.microsoft.com/documents/customerevidence/Files/710000003670/Xiamen_Tungsten_Group_unifies_enterprise.docx
[5] http://download.microsoft.com/documents/uk/partner/publicsector/DraftMicrosoftResponsetoGovernment.docx
[6] https://docs.google.com/document/d/1GCsZ3a-ACHNA6bF-pr1bGqrlgX287ZIbpV6QJ3N3keU/edit#
Comment 13 Heiko Tietze 2016-09-14 07:10:19 UTC
How about checking "Archive PDF" by default? This option includes the tagged feature and produces more standardized results. Otherwise I think average documents are small enough to deal with double its size. We do not talk about 100MB files.
Comment 14 Yousuf Philips (jay) (retired) 2016-09-14 18:51:47 UTC
"Create PDF Form' is enabled by default and enabling "Archive PDF" disables that option. Also when i tested exporting a file (number 6 from comment 12), it gave some warnings of loss of features and the file size jumped 50% compared to the tagged pdf version, so i wouldnt think it would be suitable as default for the masses.
Comment 15 Alex ARNAUD 2018-04-23 13:31:06 UTC
(In reply to Heiko Tietze from comment #13)
> How about checking "Archive PDF" by default? This option includes the tagged
> feature and produces more standardized results. Otherwise I think average
> documents are small enough to deal with double its size. We do not talk
> about 100MB files.

I'm also in favor of enabling archive PDF by default because it's less problematic to increase size of PDF than creating inaccessible PDFs.

Also, Microsoft Office exports to accessible PDF by default.

@Heiko: Do you know who is aware on how to change a default settings in LibreOffice? I assume it's trivial for someone aware of this.

Best regards,
Alex.
Comment 16 Heiko Tietze 2020-10-14 18:07:26 UTC
(In reply to Alex ARNAUD from comment #15)
> @Heiko: Do you know who is aware on how to change a default settings in
> LibreOffice? I assume it's trivial for someone aware of this.

The checkbox is on filter/uiconfig/ui/pdfgeneralpage.ui. It is set by const bool bIsPDFA = (pParent->mnPDFTypeSelection>=1) && (pParent->mnPDFTypeSelection <= 3); in filter/source/pdf/impdialog.cxx, which is defined in filter/source/pdf/pdfexport.hxx. The value is read in impdialog.cxx as mnPDFTypeSelection =  maConfigItem.ReadInt32( "SelectPdfVersion", 0 ); and this configuration is defined in https://opengrok.libreoffice.org/xref/core/officecfg/registry/schema/org/openoffice/Office/Common.xcs?r=a927e096#5389 as 0. Just set it to one of the other values.

Samuel, Thorsten: What is a sane/save default for this?
Comment 17 RISHAV 2021-01-13 04:32:40 UTC
Hi, I am working on this. Can you please point me where should I look for the code related to this issue?
Comment 18 Buovjaga 2021-01-13 06:52:13 UTC
(In reply to RISHAV from comment #17)
> Hi, I am working on this. Can you please point me where should I look for
> the code related to this issue?

The code pointer was in the last comment. Please always read some comments before asking.
Comment 19 stragu 2021-07-05 05:52:45 UTC
@RISHAV, are you still working on this one?
Comment 20 Buovjaga 2021-07-05 07:23:35 UTC
(In reply to stragu from comment #19)
> @RISHAV, are you still working on this one?

As it's been 6 months, it's safe to assume the answer is "no"
Comment 21 stragu 2021-07-15 00:54:44 UTC
I am interested in fixing this one.

Now, since this bug was reported, we have PDF/UA available as an option.

I understand that it would be best to now use PDF/UA as a default, as opposed to only a tagged PDF (given that, as far as I know, PDF/UA is a more recent standard, and its features is superset of tagged PDF).

For this setting, the relevant line is this one:

https://opengrok.libreoffice.org/xref/core/officecfg/registry/schema/org/openoffice/Office/Common.xcs?r=a927e096#5397

I am currently testing file size changes on a sample of 213 files from the test files from core, all the files in:
- sw/qa/extras/odfimport/data
- sw/qa/extras/ooxmlimport/data
- sw/qa/extras/ww8import/data

Please let me know if you think there are more appropriate files to test PDF export on.
Comment 22 stragu 2021-07-17 00:17:16 UTC
Created attachment 173642 [details]
comparison stats between current PDF export settings vs PDF/UA

Using the 213 sample files, I get to the following stats on increase in PDF size:

1365.94% maximum change
104.88% minimum change
186.28%	median change
258.87%	mean change

The most important value here is the median change: most of the example files will result in a PDF/UA file that is less than twice the size of the current default.
The largest increases seem to be mostly related to tables and special fields.

Even if the increase is significant, I still think this is a wonderful improvement to make PDF accessible by default. It is also important for LO to be a credible tool in businesses and public institutions, especially since laws about accessibility are increasingly common. If users are concerned about PDF size, they still have the option to change settings to lower it.

Given that this bug report was originally about tagged PDF, wondering if anyone has an opinion on which option is best:
- Default PDF/UA, but when unticked: "Tagged PDF" is unticked.
- Default PDF/UA, but when unticked: "Tagged PDF" is still ticked.

I would go with the second option.
Comment 23 V Stuart Foote 2021-07-17 09:07:55 UTC
(In reply to stragu from comment #22)

See the open enhancement of bug 117428 to implement a PDF /ActualText structure for each word as iterated by ICU word bounds. 

That enhanced PDF content tagging would significantly alter Tagged PDF and PDF/UA size--but potentially greatly improve fidelity of assistive technology rendering of all document content.
Comment 24 stragu 2021-07-20 13:49:40 UTC
Thanks, Stuart.

So if that by-word ActualText was to be implemented, it would automatically be integrated in both UA and Tagged PDF, and would increase the size further? Did I understand it right?

One more question: if the commit makes PDF export tests fail (because the files generated by the tests are obviously different with the new default settings), should the tests be modified in the same commit before submitting to Gerrit?
Comment 25 V Stuart Foote 2021-07-20 15:07:01 UTC
(In reply to stragu from comment #24)
> Thanks, Stuart.
> 
> So if that by-word ActualText was to be implemented, it would automatically
> be integrated in both UA and Tagged PDF, and would increase the size
> further? Did I understand it right?
> 

Yes that is my understanding.

> One more question: if the commit makes PDF export tests fail (because the
> files generated by the tests are obviously different with the new default
> settings), should the tests be modified in the same commit before submitting
> to Gerrit?

Probably also needed.
Comment 26 V Stuart Foote 2021-07-20 15:09:17 UTC Comment hidden (obsolete)
Comment 27 stragu 2021-08-07 11:22:42 UTC
Created attachment 174124 [details]
comparison stats between current PDF export settings vs tagged PDF and PDF/UA

Updated stats on 213 sample files, using both tagged and PDF/UA options. Median rate of size change is 1.17 for tagged PDF, and 1.86 for PDF/UA.

With PDF/UA as a default, most PDFs (according to this sample) wouldn't reach a doubling in size.
Comment 28 stragu 2021-08-07 11:26:35 UTC
Created attachment 174125 [details]
histogram of change rates in PDF sizes, tagged vs PDF/UA

Visualisation of how sizes change for 213 sample files, with median value highlighted.
PDF/UA results are more variable compared to tagged PDF, but median stays below a doubling in size.
Comment 29 stragu 2021-08-07 11:27:54 UTC
Created attachment 174126 [details]
R script to process and visualise file sizes

Just in case it is useful / for transparency, the R script that processed the data and created the histogram.