Bug 39667 - Enable accessible/tagged PDF export options by default
Summary: Enable accessible/tagged PDF export options by default
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium enhancement
Assignee: Samuel Mehrbrodt (allotropia)
URL:
Whiteboard: target:7.6.0 inReleaseNotes:7.6 targe...
Keywords: accessibility, difficultyBeginner, easyHack, filter:pdf, skillDesign, topicUI
Depends on:
Blocks: a11y, Accessibility PDF-Export PDF-Accessibility
  Show dependency treegraph
 
Reported: 2011-07-29 08:37 UTC by Christophe Strobbe
Modified: 2023-09-02 15:44 UTC (History)
15 users (show)

See Also:
Crash report or crash signature:


Attachments
comparison stats between current PDF export settings vs PDF/UA (40.23 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-07-17 00:17 UTC, Stéphane Guillou (stragu)
Details
Result of bug 117428 OP STR as pasted to Notepad++ UTF-8 (31.96 KB, image/png)
2021-07-20 15:09 UTC, V Stuart Foote
Details
comparison stats between current PDF export settings vs tagged PDF and PDF/UA (45.01 KB, application/vnd.oasis.opendocument.spreadsheet)
2021-08-07 11:22 UTC, Stéphane Guillou (stragu)
Details
histogram of change rates in PDF sizes, tagged vs PDF/UA (24.73 KB, image/svg+xml)
2021-08-07 11:26 UTC, Stéphane Guillou (stragu)
Details
R script to process and visualise file sizes (2.35 KB, text/x-r-source)
2021-08-07 11:27 UTC, Stéphane Guillou (stragu)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Strobbe 2011-07-29 08:37:30 UTC
When exporting ODF to PDF (in Writer or another LibreOffice component), the author can/should check several options in the export dialog to produce tagged and accessible PDF:
* the options "Tagged PDF" and "Export bookmarks" on the General tab,
* the option "Bookmarks and page" on the Initial View tab (because many users who can benefit from bookmarks don't know they exist).

It is a good thing (!) that these options remain checked after the first time, but it would be even better if they were already checked by default (i.e. right after installation): many documents are sufficiently accessible to benefit from tagged PDF export, but most authors don't know that tagged PDF exists or what it means.
Enabling tagged PDF by default won't turn inaccessible ODF files into accessible PDF, but it would make many PDF files much easier to navigate with a screen reader. In untagged PDF, screen reader shortcuts for navigating documents (h for heading, t for table, p for paragraph, etc) won't work.

Background:
http://blogs.adobe.com/acrolaw/2006/01/understanding_t_1/ explains what tagged PDF is and why it matters.
Comment 1 Tom 2011-07-30 06:28:25 UTC
Hi :)
It's an Accessibility issue but i couldn't find how to flag that up.
Regards from
Tom :)
Comment 2 Christophe Strobbe 2011-08-18 08:35:48 UTC
Microsoft Office 2010's built-in PDF export produces tagged PDF. I have not found a way to turn this off.
Comment 3 Björn Michaelsen 2011-12-23 12:21:31 UTC Comment hidden (noise)
Comment 4 Christophe Strobbe 2012-01-27 03:23:50 UTC
I confirm that this functionality request is still relevant to LibreOffice 3.5.0 RC1 (i.e. bookmarks are exported by default, but tags are not) and change the status from UNCONFIRMED to NEW.
Comment 5 QA Administrators 2015-02-19 15:43:51 UTC Comment hidden (noise)
Comment 6 Owen Genat (retired) 2015-03-22 12:42:02 UTC
Tagged PDF option still not checked by default under v4.4.1.2 when exporting to PDF.
Comment 7 Robinson Tryon (qubit) 2015-03-31 13:55:19 UTC Comment hidden (noise)
Comment 8 Robinson Tryon (qubit) 2015-12-10 03:37:39 UTC Comment hidden (noise)
Comment 9 Yousuf Philips (jay) (retired) 2016-09-13 10:08:26 UTC
So "export bookmarks" is already enabled, so 1 of 3 is already done. Enabling "bookmarks and page" seems fine as we have already enabled "export bookmarks" and pdfs with no bookmarks arent affected by this setting. "tagged pdf" is the main issue to discuss as the help states "Selects to write PDF tags. This can increase file size by huge amounts.", so i'm not to sure about enabling this by default.

@Heiko, @Stuart, @Cor, @Adolfo: What are your takes?
Comment 10 Cor Nouws 2016-09-13 19:40:23 UTC
(In reply to Yousuf Philips (jay) from comment #9)
> So "export bookmarks" is already enabled, so 1 of 3 is already done.
> Enabling "bookmarks and page" seems fine as we have already enabled "export
> bookmarks" and pdfs with no bookmarks arent affected by this setting.
> "tagged pdf" is the main issue to discuss as the help states "Selects to
> write PDF tags. This can increase file size by huge amounts.", so i'm not to
> sure about enabling this by default.
> 
> @Heiko, @Stuart, @Cor, @Adolfo: What are your takes?

Do some tests on file size..
But I agree with Christophe about the benefit. Think I would give that more weight..
Comment 11 V Stuart Foote 2016-09-13 20:18:08 UTC
While this is simply a toggle control in the export dialog, so should be simple to implement. Think it is correct to do as a default setting to produce tagged PDF by default.

More important but much more work would be to refactor the tagged PDF to produce PDF/UA compliant PDF (bug 45636).
Comment 12 Yousuf Philips (jay) (retired) 2016-09-14 00:40:09 UTC
So i tested a few docs i had from my compatibility testing days and here is the change in size.

747.4 KiB -> 3.9 MiB (+433%) [1]
979.5 KiB -> 1.5 MiB (+52%) [2]
515.0 KiB -> 680.3 KiB (+325) [3]
160.6 KiB -> 198.6 KiB [4] (+24)
184.8 KiB -> 385.7 KiB [5] (+109%)
405.7 KiB -> 484.6 KiB (+19%) [6] (a google doc i threw into the mix)

So in half of the cases, the size more than doubles and the other half it less than doubles, so it all comes down to how much structure the file has and whether users will complain about their pdfs growing way to much.

[1] http://www.microsoft.com/investor/reports/ar13/docs/2013_Annual_Report.docx
[2] attachment 103815 [details]
[3] http://download.microsoft.com/documents/rus/microsoft4you/How_to_license_the_operating_system_Windows_8_new.docx
[4] http://download.microsoft.com/documents/customerevidence/Files/710000003670/Xiamen_Tungsten_Group_unifies_enterprise.docx
[5] http://download.microsoft.com/documents/uk/partner/publicsector/DraftMicrosoftResponsetoGovernment.docx
[6] https://docs.google.com/document/d/1GCsZ3a-ACHNA6bF-pr1bGqrlgX287ZIbpV6QJ3N3keU/edit#
Comment 13 Heiko Tietze 2016-09-14 07:10:19 UTC
How about checking "Archive PDF" by default? This option includes the tagged feature and produces more standardized results. Otherwise I think average documents are small enough to deal with double its size. We do not talk about 100MB files.
Comment 14 Yousuf Philips (jay) (retired) 2016-09-14 18:51:47 UTC
"Create PDF Form' is enabled by default and enabling "Archive PDF" disables that option. Also when i tested exporting a file (number 6 from comment 12), it gave some warnings of loss of features and the file size jumped 50% compared to the tagged pdf version, so i wouldnt think it would be suitable as default for the masses.
Comment 15 Alex ARNAUD 2018-04-23 13:31:06 UTC
(In reply to Heiko Tietze from comment #13)
> How about checking "Archive PDF" by default? This option includes the tagged
> feature and produces more standardized results. Otherwise I think average
> documents are small enough to deal with double its size. We do not talk
> about 100MB files.

I'm also in favor of enabling archive PDF by default because it's less problematic to increase size of PDF than creating inaccessible PDFs.

Also, Microsoft Office exports to accessible PDF by default.

@Heiko: Do you know who is aware on how to change a default settings in LibreOffice? I assume it's trivial for someone aware of this.

Best regards,
Alex.
Comment 16 Heiko Tietze 2020-10-14 18:07:26 UTC
(In reply to Alex ARNAUD from comment #15)
> @Heiko: Do you know who is aware on how to change a default settings in
> LibreOffice? I assume it's trivial for someone aware of this.

The checkbox is on filter/uiconfig/ui/pdfgeneralpage.ui. It is set by const bool bIsPDFA = (pParent->mnPDFTypeSelection>=1) && (pParent->mnPDFTypeSelection <= 3); in filter/source/pdf/impdialog.cxx, which is defined in filter/source/pdf/pdfexport.hxx. The value is read in impdialog.cxx as mnPDFTypeSelection =  maConfigItem.ReadInt32( "SelectPdfVersion", 0 ); and this configuration is defined in https://opengrok.libreoffice.org/xref/core/officecfg/registry/schema/org/openoffice/Office/Common.xcs?r=a927e096#5389 as 0. Just set it to one of the other values.

Samuel, Thorsten: What is a sane/save default for this?
Comment 17 RISHAV 2021-01-13 04:32:40 UTC
Hi, I am working on this. Can you please point me where should I look for the code related to this issue?
Comment 18 Buovjaga 2021-01-13 06:52:13 UTC
(In reply to RISHAV from comment #17)
> Hi, I am working on this. Can you please point me where should I look for
> the code related to this issue?

The code pointer was in the last comment. Please always read some comments before asking.
Comment 19 Stéphane Guillou (stragu) 2021-07-05 05:52:45 UTC
@RISHAV, are you still working on this one?
Comment 20 Buovjaga 2021-07-05 07:23:35 UTC
(In reply to stragu from comment #19)
> @RISHAV, are you still working on this one?

As it's been 6 months, it's safe to assume the answer is "no"
Comment 21 Stéphane Guillou (stragu) 2021-07-15 00:54:44 UTC
I am interested in fixing this one.

Now, since this bug was reported, we have PDF/UA available as an option.

I understand that it would be best to now use PDF/UA as a default, as opposed to only a tagged PDF (given that, as far as I know, PDF/UA is a more recent standard, and its features is superset of tagged PDF).

For this setting, the relevant line is this one:

https://opengrok.libreoffice.org/xref/core/officecfg/registry/schema/org/openoffice/Office/Common.xcs?r=a927e096#5397

I am currently testing file size changes on a sample of 213 files from the test files from core, all the files in:
- sw/qa/extras/odfimport/data
- sw/qa/extras/ooxmlimport/data
- sw/qa/extras/ww8import/data

Please let me know if you think there are more appropriate files to test PDF export on.
Comment 22 Stéphane Guillou (stragu) 2021-07-17 00:17:16 UTC
Created attachment 173642 [details]
comparison stats between current PDF export settings vs PDF/UA

Using the 213 sample files, I get to the following stats on increase in PDF size:

1365.94% maximum change
104.88% minimum change
186.28%	median change
258.87%	mean change

The most important value here is the median change: most of the example files will result in a PDF/UA file that is less than twice the size of the current default.
The largest increases seem to be mostly related to tables and special fields.

Even if the increase is significant, I still think this is a wonderful improvement to make PDF accessible by default. It is also important for LO to be a credible tool in businesses and public institutions, especially since laws about accessibility are increasingly common. If users are concerned about PDF size, they still have the option to change settings to lower it.

Given that this bug report was originally about tagged PDF, wondering if anyone has an opinion on which option is best:
- Default PDF/UA, but when unticked: "Tagged PDF" is unticked.
- Default PDF/UA, but when unticked: "Tagged PDF" is still ticked.

I would go with the second option.
Comment 23 V Stuart Foote 2021-07-17 09:07:55 UTC
(In reply to stragu from comment #22)

See the open enhancement of bug 117428 to implement a PDF /ActualText structure for each word as iterated by ICU word bounds. 

That enhanced PDF content tagging would significantly alter Tagged PDF and PDF/UA size--but potentially greatly improve fidelity of assistive technology rendering of all document content.
Comment 24 Stéphane Guillou (stragu) 2021-07-20 13:49:40 UTC
Thanks, Stuart.

So if that by-word ActualText was to be implemented, it would automatically be integrated in both UA and Tagged PDF, and would increase the size further? Did I understand it right?

One more question: if the commit makes PDF export tests fail (because the files generated by the tests are obviously different with the new default settings), should the tests be modified in the same commit before submitting to Gerrit?
Comment 25 V Stuart Foote 2021-07-20 15:07:01 UTC
(In reply to stragu from comment #24)
> Thanks, Stuart.
> 
> So if that by-word ActualText was to be implemented, it would automatically
> be integrated in both UA and Tagged PDF, and would increase the size
> further? Did I understand it right?
> 

Yes that is my understanding.

> One more question: if the commit makes PDF export tests fail (because the
> files generated by the tests are obviously different with the new default
> settings), should the tests be modified in the same commit before submitting
> to Gerrit?

Probably also needed.
Comment 26 V Stuart Foote 2021-07-20 15:09:17 UTC Comment hidden (obsolete)
Comment 27 Stéphane Guillou (stragu) 2021-08-07 11:22:42 UTC
Created attachment 174124 [details]
comparison stats between current PDF export settings vs tagged PDF and PDF/UA

Updated stats on 213 sample files, using both tagged and PDF/UA options. Median rate of size change is 1.17 for tagged PDF, and 1.86 for PDF/UA.

With PDF/UA as a default, most PDFs (according to this sample) wouldn't reach a doubling in size.
Comment 28 Stéphane Guillou (stragu) 2021-08-07 11:26:35 UTC
Created attachment 174125 [details]
histogram of change rates in PDF sizes, tagged vs PDF/UA

Visualisation of how sizes change for 213 sample files, with median value highlighted.
PDF/UA results are more variable compared to tagged PDF, but median stays below a doubling in size.
Comment 29 Stéphane Guillou (stragu) 2021-08-07 11:27:54 UTC
Created attachment 174126 [details]
R script to process and visualise file sizes

Just in case it is useful / for transparency, the R script that processed the data and created the histogram.
Comment 30 Xisco Faulí 2022-05-02 14:48:28 UTC Comment hidden (noise)
Comment 31 Naman 2022-11-16 14:09:43 UTC
IS SOMEONE working on it ? Can I work if nobody is doing it
Comment 32 Stéphane Guillou (stragu) 2022-11-16 16:20:35 UTC
Hi Naman

The bug has no assignee anymore, so yes, please feel free to work on it.

When you switch the default settings, you will also have to fix the tests that fail as a consequence. (see comment 25)

Thank you!
Comment 33 V Stuart Foote 2022-11-16 19:11:31 UTC
Assume the UI tweak, as in comment 21, will be to set enabled PDF/UA (ISO 14289-1) while adjusting the qa tests of outputs accordingly.
Comment 34 Naman 2022-11-24 08:54:42 UTC
What to do? I am still not getting it.
Do i have to enable UA (and hence Tagged) by default ?  
I have successfully built LibreOffice in my system 
I have navigated to the file as mentioned in Comment 16 and Comment 21
Comment 35 Christophe Strobbe 2022-11-24 10:16:28 UTC
(In reply to Stéphane Guillou (stragu) from comment #21)
> Now, since this bug was reported, we have PDF/UA available as an option.
> 
> I understand that it would be best to now use PDF/UA as a default, as
> opposed to only a tagged PDF (given that, as far as I know, PDF/UA is a more
> recent standard, and its features is superset of tagged PDF).

As the original author of this change request, I would suggest using PDF/UA as the default. In some countries, organisations in charge of monitoring the accessibility of websites often simply check whether a PDF complies with PDF/UA.

With regard to file sizes: I wonder whether this matters a lot in an age where people stream many megabytes of video over their network connections.
It is a concern in countries where internet connections are much slower. There is probably much overlap between these countries and countries that have no legislation related to digital accessibility.
Comment 36 Dustin Hacker 2023-01-16 21:59:33 UTC Comment hidden (off-topic)
Comment 37 Buovjaga 2023-01-16 22:04:45 UTC Comment hidden (off-topic)
Comment 38 Dustin Hacker 2023-01-19 03:04:15 UTC Comment hidden (off-topic)
Comment 39 Dustin Hacker 2023-01-19 04:26:56 UTC Comment hidden (off-topic, offtopic)
Comment 40 Buovjaga 2023-01-19 05:41:40 UTC Comment hidden (off-topic)
Comment 41 Samuel Mehrbrodt (allotropia) 2023-01-19 07:08:52 UTC
While I agree that "Tagged" PDF should be enabled by default, I'm not sure doing this for PDF/UA does make sense.

Creating PDF/UA compliant documents requires the user to create the documents in a very specific way. I don't think this makes sense for the average user.
Thing is, when the PDF/UA checkbox is enabled, the user sees the a11y checker when trying to export a PDF. Even an empty Writer document already has a problem reported: "Document title not set".
This will confuse average users.

So my suggestion is to enable tagged PDF by default, but not PDF/UA.
Comment 42 Commit Notification 2023-03-13 09:26:25 UTC
Samuel Mehrbrodt committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/544d6d781b3c8aa108ced362d708693b5127f3d7

tdf#39667 Enable tagged PDF by default

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 43 Michael Stahl (allotropia) 2023-03-14 15:12:15 UTC
see also commit 4a96f25ac3ef9f2ed940d6e56eca87bba387d451
Comment 44 Commit Notification 2023-03-16 09:10:25 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/529afed0ba3ca5e659cea661816e9164846630e8

tdf#39667 filter,officecfg: PDF export dialog: set initial view to...

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 45 Commit Notification 2023-03-18 20:07:13 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-5":

https://git.libreoffice.org/core/commit/b6bb2446c18d78ee7494b3abf6ff7329b5756f0a

tdf#39667 filter,officecfg: PDF export dialog: set initial view to...

It will be available in 7.5.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 46 Gabor Kelemen (allotropia) 2023-04-12 22:50:06 UTC
Verified in 

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: fc6806c4be8585ce0d35a6b581bf8b3dbf858500
CPU threads: 14; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: hu-HU (hu_HU); UI: hu-HU
Calc: threaded

New default:
- Tagged PDF is enabled
- Bookmarks and page on the Initial View tab, instead of Only page

Turning on PDF/UA turns on:
- Export Outline, also disables the box
- Use Reference XObjects is turned off, box is disabled
- Under Panels, the option Bookmarks and page on the Initial View tab is selected, the radio buttons are disabled.