Bug 149140 - Table header cells have scope set to None instead of Column after exporting Writer table to PDF/UA
Summary: Table header cells have scope set to None instead of Column after exporting W...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Printing and PDF export (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Michael Stahl (allotropia)
URL:
Whiteboard: target:7.5.0 target:7.4.4
Keywords: accessibility
: 149067 (view as bug list)
Depends on:
Blocks: PDF-Export PDF-Accessibility
  Show dependency treegraph
 
Reported: 2022-05-17 21:36 UTC by Christophe Strobbe
Modified: 2023-01-18 18:19 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
ZIP file containing an ODT test file and three PDF files derived from it (60.06 KB, application/zip)
2022-05-17 21:36 UTC, Christophe Strobbe
Details
Example file created from scratch per instructions (13.99 KB, application/vnd.oasis.opendocument.text)
2022-10-24 15:33 UTC, Gabor Kelemen (allotropia)
Details
Example file exported to PDF/UA (50.80 KB, application/pdf)
2022-10-24 15:34 UTC, Gabor Kelemen (allotropia)
Details
ZIP file containing an ODT test file and three PDF files derived from it (219.75 KB, application/zip)
2022-10-25 21:05 UTC, Christophe Strobbe
Details
The example file exported to PDF in PAC tool (97.17 KB, image/png)
2023-01-18 18:19 UTC, Gabor Kelemen (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Christophe Strobbe 2022-05-17 21:36:39 UTC
Created attachment 180174 [details]
ZIP file containing an ODT test file and three PDF files derived from it

The table header cells in a table header row always have their scope set to "None" instead of "Column" after exporting a Writer document containing a table to PDF/UA.

Steps to reproduce the issue:

1. Create a new document and insert a table with at least two rows and two columns. In the first row of cells, enter some text that may serve to describe the content of the rows. Enter some other data into the other rows. Copy this table two times, so you eventually have three tables in the document.
2. Right-click on the first table, open the Table Properties dialog, go to the Text Flow tab and check the checkbox "Repeat heading" (and keep the value for "rows" on "1"). This turns the first row from a row of normal data cells into a header row.
3. In the second table, select the first row and apply the style "Table Heading" to it. (The table properties remained at their default values.)
4. In the third table, apply the style "Heading 2" to the cells in the first row.
5. Export the file to PDF, making sure that the checkbox "Universal Accessibility (PDF/UA)" is selected.

Observed result:

Open the PDF file in Adobe Acrobat Pro and inspect the tags for the first row in each table.
1. The first row in the first table is marked up as a TR with TH cells for the table header cells. Open the Accessibility tool, click "Reading Order", click on the first table, click on the "Table Editor" button in the Reading Order dialog and notice how type of header cells is displayed. (If these tags are not displayed, check the radio button "Structure types" in the Reading Order dialog.) Right-click on a TH cell and open the Table Cell Properties; notice that "Scope" is set to "None". This is the case for each TH cell.
2. The first row in the second table is also marked up as a TR with TH cells for the table header cells. Inspect the TH cells in the same way is in the previous step and observe the same result.
3. The first row in the third table is obviously not marked up as a table header row: it is a row to TD cells, each containing an H2 tag, as expected, since using Heading x styles is not the proper way to mark table header rows in Writer.
4. Open the PDF file in PAC 2021 (freely available at https://pdfua.foundation/en but only for Windows). Open the detailed report, drill down to Logical Structure > Structure Elements > Tables, and notice the error "Table header cell has no associated subcells" for each table header cell in the first two tables.

Expected result:
Each of the TH cells has its scope set to "Column" and the TH cells don't trigger an error in PAC 2021.

Workaround for people who have Adobe Acrobat Pro:

Open the document in Adobe Acrobat again, open the "Reading Order" as described in step 1 (above) and change the scope of the TH cells in the first two tables from "None" to "Column". Save the PDF file and open it again in PAC 2021. The TH cells no longer trigger an error.
It may be possible to fix the issue using the free online tool PAVE at https://pave-pdf.org/index.html but my PDF file triggered an internal server error when I tried to upload it.

Version: 7.3.3.2 (x64) / LibreOffice Community
Build ID: d1d0ea68f081ee2800a922cac8f79445e4603348
CPU threads: 8; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL

The attached ZIP file contains the following files:
- TableTH_test_LibreOfficeWriter7.3.3_HeaderRow-HeadersInTopRow.odt : the orginal test file containing three tables in which the header rows are marked up in three different ways (i.e. as decribed in the instructions).
- TableTH_test_LibreOfficeWriter7.3.3_HeaderRow-HeadersInTopRow.pdf : the PDF resulting from exporting the ODF file to PDF/UA using LibreOffice Writer 7.3.3.2.
- TableTH_test_LibreOfficeWriter7.3.3_HeaderRow-HeadersInTopRow_ScopeCorrected.pdf : a version of the previous file in which the TH scope was corrected to "Column" as described in the workaround.
- TableTH_test_LibreOfficeWriter7.3.3_HeaderRow-HeadersInTopRow_ExportedFromOOo3.3.0.pdf : the PDF PDF resulting from exporting the ODF file to "tagged PDF" using OpenOffice.org 3.3.0 [OOO330m20 (Build:9567)] and which has the same TH issue as current versions of LibreOffice. This file shows that the issue was inherited from OpenOffice.org.
Comment 1 Christophe Strobbe 2022-05-18 09:10:37 UTC
Bug 135192 is a somewhat related bug for Impress. However, Impress does not export table tags at all (Table, TR, TD, TH).
Comment 2 Dieter 2022-06-01 13:17:58 UTC
(In reply to Christophe Strobbe from comment #0)
> 5. Export the file to PDF, making sure that the checkbox "Universal
> Accessibility (PDF/UA)" is selected.

Actuak result:
Accessibility check says: "Tables must not contain headings." And this is only related t third tabla and paragraph style "Heading 1"

Version: 7.3.4.1 (x64) / LibreOffice Community
Build ID: 13668373362b52f6e3ebcaaecb031bd59a3ac66b
CPU threads: 4; OS: Windows 10.0 Build 19044; UI render: Skia/Raster; VCL: win
Locale: de-DE (de_DE); UI: en-GB
Calc: CL
Comment 3 Christophe Strobbe 2022-06-01 13:35:00 UTC
(In reply to Dieter from comment #2)
> (In reply to Christophe Strobbe from comment #0)
> > 5. Export the file to PDF, making sure that the checkbox "Universal
> > Accessibility (PDF/UA)" is selected.
> 
> Actual result:
> Accessibility check says: "Tables must not contain headings." And this is
> only related t third tabla and paragraph style "Heading 1"

This bug report is not about LibreOffice's built-in accessibility checker; it is about PDF/UA conformance of the exported PDF. In the exported PDF file, table header cells have their scope attribute set to the value "None". This causes an error message in the PDF/UA conformance checker PAC 2021. (Adobe Acrobat Pro's accessibility checker does not complain about this; however, Adobe Acrobat has completely dropped the ball on PDF/UA conformance.)

The error you cite is triggered by a Heading x style inside a table header cell; that error message is justified but not directly relevant to this bug. (I reused a sample document submitted for a different bug.)
Comment 4 Gabor Kelemen (allotropia) 2022-10-24 13:08:01 UTC
(In reply to Christophe Strobbe from comment #0)
> Created attachment 180174 [details]
> ZIP file containing an ODT test file and three PDF files derived from it
> 

I was trying to confirm the reported behavior, but this attachment seems to belong to another bug:

$ unzip  -l /cygdrive/c/Users/Gabor/Downloads/Bug\ 39935.zip
Archive:  /cygdrive/c/Users/Gabor/Downloads/Bug 39935.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
    21614  05-17-2022 11:39   Bug_039935_LibreOffice7.3.3.2.odt
    20997  05-17-2022 11:39   Bug_039935_LibreOffice7.3.3.2_AsianAndCtlNone.odt
    13114  05-17-2022 11:39   Bug_039935_OpenOffice.org-3.3.0.odt
    12901  05-17-2022 11:39   Bug_039935_OpenOffice.org-3.3.0_AsianAndCtlNone.odt
---------                     -------
    68626                     4 files

Further, I think this is likely a duplicate of bug 149067 as I got the same "Table header cell has no associated subcells" error from the PAC tool with the example file attachment 180091 [details] there.
Comment 5 Gabor Kelemen (allotropia) 2022-10-24 15:33:45 UTC
Created attachment 183238 [details]
Example file created from scratch per instructions
Comment 6 Gabor Kelemen (allotropia) 2022-10-24 15:34:13 UTC
Created attachment 183239 [details]
Example file exported to PDF/UA
Comment 7 Michael Stahl (allotropia) 2022-10-25 18:46:13 UTC
the problem with the 2nd table has nothing to do with tables: veraPDF complains that the first header is H2 not H1, and indeed if i change the paragraphs to "Heading 1" veraPDF stops complaining.

so i guess it could be fixed by determining the highest level of heading used in the document, and then mapping that to H1... but it's not clear what the practical benefit of that would be; a validator complaining about this seems a bit silly when applications allow users to create such documents.

... no actually it would not help: if i create a document that first has "Heading 2" and then "Heading 1", then veraPDF complains about it.

this problem can only sensibly be fixed by the author of the document; there is already a warning dialog on PDF export that says: "Keep headings' levels ordered. Heading level 2 must not go after 0."
Comment 8 Michael Stahl (allotropia) 2022-10-25 19:04:39 UTC
now i'm thinking this validation rule is even worse: consider that you can select an arbitrary part of the document, and then export that to PDF - what is the point of complaining about a H[2-N] as the first heading of a document?
Comment 9 Christophe Strobbe 2022-10-25 21:05:43 UTC
Created attachment 183274 [details]
ZIP file containing an ODT test file and three PDF files derived from it

This attachment replaces the one I uploaded in May and was not for this bug.
The ZIP file contains three files:
1. An ODT file containing three tables, as described in the original bug report.
2. A PDF file exported using OpenOffice.org 3.3.0.
3. A PDF file exported from LibreOffice 7.3.3.
4. The same PDF as the previous one but with the scope corrected in Adobe Acrobat Pro.
Comment 10 Christophe Strobbe 2022-10-25 21:12:22 UTC
Michael Stahl, this bug report is not about the use of Heading 1, Heading 2, etc. but about the scope of table header cells in the table header row having their scope set to "None" instead of "Column". (See the error "Table header cell has no associated subcells" in PAC 2021.)

The presence of Heading 2 styles in the third table (not the second one) should not distract from this. It is something I only included as a test because some people erroneously think that this is a correct way of marking table header cells. The validator is entirely correct in flagging this as an error.
Comment 11 Commit Notification 2022-11-15 19:42:55 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/dfffe710d07f84f4152cf61ccd4a69279a26ff7c

tdf#149140 vcl,sw: PDF/UA export: add Scope attribute to table headers

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Michael Stahl (allotropia) 2022-11-15 19:45:28 UTC
fixed on master
Comment 13 Michael Stahl (allotropia) 2022-11-16 14:27:04 UTC
*** Bug 149067 has been marked as a duplicate of this bug. ***
Comment 14 Commit Notification 2022-11-17 16:23:41 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/83e622ad9d7ae86dd12823109cc4a2edad7ad842

tdf#149140 vcl,sw: PDF/UA export: add Scope attribute to table headers

It will be available in 7.4.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Commit Notification 2022-11-24 18:38:36 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/28b06bb7236f8c6e17423dc3df446306900355f1

tdf#149140 vcl: PDF/UA export: Scope attribute exists since PDF 1.5

It will be available in 7.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Commit Notification 2022-11-25 09:08:15 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-7-4":

https://git.libreoffice.org/core/commit/5cc596a99643764f698bbc6c4bfb7ba561dee568

tdf#149140 vcl: PDF/UA export: Scope attribute exists since PDF 1.5

It will be available in 7.4.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Gabor Kelemen (allotropia) 2023-01-18 18:19:00 UTC
Created attachment 184763 [details]
The example file exported to PDF in PAC tool

Verified with files from attachment 183274 [details] in

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: f1830bff71847a9c17715cff52383956719847fe
CPU threads: 14; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: en-US (hu_HU); UI: en-US
Calc: threaded

There are no more "Table header cell has no associated subcells" error messages.