Bug 163240 - Accessibility issue: docx to pdf conversion creates duplicate <Link> tags when hyperlinks extend over multiple lines
Summary: Accessibility issue: docx to pdf conversion creates duplicate <Link> tags whe...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
24.8.2.1 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: accessibility
Depends on:
Blocks: PDF-Export PDF-Accessibility
  Show dependency treegraph
 
Reported: 2024-10-01 20:45 UTC by ekressmiller
Modified: 2025-07-15 16:02 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Example file created per instructions (23.82 KB, application/vnd.oasis.opendocument.text)
2024-11-25 11:50 UTC, Gabor Kelemen (allotropia)
Details
The example file exported as PDF/UA (23.55 KB, application/pdf)
2024-11-25 11:50 UTC, Gabor Kelemen (allotropia)
Details
The PAC tool shows the issue in the document logical structure view (73.75 KB, image/png)
2024-11-25 11:52 UTC, Gabor Kelemen (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ekressmiller 2024-10-01 20:45:34 UTC
Description:
When converting from a docx file to a tagged PDF, hyperlinked phrases that extend over multiple lines create duplicate <Link> tags in the resulting PDF. This is an accessibility issue because it causes repetition and makes navigation harder for screen reader users.

Steps to Reproduce:
1. Create a docx document with a multi-word hyperlink that starts on one line and continues onto the next line.
2. Convert to PDF using LibreOffice (in my case, I did this using Docassemble, which converts documents using LibreOffice through Unoconv (https://github.com/unoconv/unoconv))
3. View PDF Accessibility tags or use a screen reader to confirm existence of duplicate links.

Actual Results:
The Accessibility tags are duplicated for the hyperlink, one tag for the words on each separate line. A screen reader reads the hyperlink twice. The hyperlink is also listed twice in the screen reader's list of links on the page.

Expected Results:
There should be only a single <Link> tag for a single hyperlink, even if it extends over multiple lines. A screen reader should only read the link once.


Reproducible: Always


User Profile Reset: No

Additional Info:
N/A. Using LibreOffice via Docassemble and Unoconv.
Comment 1 Chika 2024-11-01 18:37:07 UTC
Hello,

Thank you for reporting the bug. I can confirm that the bug is present in master and dev build.

Master Version: 24.8.2.1 (X86_64) / LibreOffice Community
Build ID: 0f794b6e29741098670a3b95d60478a65d05ef13
CPU threads: 8; OS: macOS 13.6.3; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

Dev Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 2d65d52bd208acde60e77ec49b995958985babe7
CPU threads: 8; OS: macOS 13.6.3; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

Moving the status to NEW.

Sincerely,

Chika (from CSUMB tester)
Comment 2 Gabor Kelemen (allotropia) 2024-11-25 11:50:00 UTC
Created attachment 197770 [details]
Example file created per instructions
Comment 3 Gabor Kelemen (allotropia) 2024-11-25 11:50:28 UTC
Created attachment 197771 [details]
The example file exported as PDF/UA
Comment 4 Gabor Kelemen (allotropia) 2024-11-25 11:52:01 UTC
Created attachment 197772 [details]
The PAC tool shows the issue in the document logical structure view
Comment 5 Michael Stahl 2025-07-15 16:02:30 UTC
agreed.

the Link structure element was actually merged across line breaks already but then we had to revert it to the current situation with one Link per line of text, because of this PAC error https://bugs.documentfoundation.org/show_bug.cgi?id=156565#c8

... apparently the problem was that only the first Annotation had the Link SE set as its structure parent, this needs to be fixed ...