Bug 149471 - DOCX: Word header content with convert-to PDF on Linux has fidelity issues
Summary: DOCX: Word header content with convert-to PDF on Linux has fidelity issues
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.2.1.2 release
Hardware: All Linux (All)
: low normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-06-06 21:58 UTC by viewer
Modified: 2022-06-14 07:24 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Source document that is converted to PDF (229.53 KB, application/octet-stream)
2022-06-06 21:59 UTC, viewer
Details
Comparison screenshot showing the problem (183.71 KB, image/jpeg)
2022-06-06 22:00 UTC, viewer
Details
PDF created from Word document viewed in LibreOffice on Linux, then exported to PDF (146.00 KB, application/pdf)
2022-06-10 21:00 UTC, viewer
Details
Screenshot of Word docx viewed in Libre Office on Linux (151.36 KB, image/jpeg)
2022-06-10 21:04 UTC, viewer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description viewer 2022-06-06 21:58:01 UTC
Description:
When the attached Word document is converted to PDF on Linux, the header content in the PDF file has the following issues (See attached comparison screenshot: WordHeaderIssues.jpg)
1. The text WEO99999 wraps when it should not 
2. The Date value 24-02-2022 wraps when it should not.
3. The page number is not resolved as it is in MS Word. 
The first two items are not a problem for conversion on Windows, the third one  is a problem on Windows as well.

Not related to header content, for the Table of Contents on page 2 of the document, the page number 3 is missing for 
1. Major Title 1.....
2. Major Title 2.....
This is a problem on Linux only.

Steps to Reproduce:
1.On Linux, use convert-to to export the docx file provided to PDF
2.View the PDF created in Adobe Reader
3.Navigate to page 2
4.Compare the header field values in the top right area, with the file opened in LibreOffice Writer or MS Word.
5.Compare the first two lines of the Table of Contents for the page number at the right end.


Actual Results:
In the header, text is wrapped and page numbers are not resolved (Page 2 of 8)
In the Table of Contents, page number 3 is missing on two lines.
See attached WordHeaderIssues.jpg

Expected Results:
In the header, the document title and date field should fit in one line each and page number should be resolved (2 of 8)
In the Table of Contents, page number 3 should be shown at the end of the first two lines.


Reproducible: Always


User Profile Reset: No



Additional Info:
LibreOffice is run headless in the Linux environment and there isn't an About dialog.
Comment 1 viewer 2022-06-06 21:59:59 UTC
Created attachment 180608 [details]
Source document that is converted to PDF
Comment 2 viewer 2022-06-06 22:00:37 UTC
Created attachment 180609 [details]
Comparison screenshot showing the problem
Comment 3 Timur 2022-06-07 15:37:07 UTC
Detailed, but no.. 
It's common that user report convert-to problem but first step is to open DOCX in GUI and see if Fileopen is OK. 
Another mistake is to report "document problem" instead of "single issue per report" after checking in existing bugs for that issue. 
Yet another mistake, so far most probable, is not to verify that specific font from DOCX is read in Linux (name shown regularly or as italic), and to add screenshot from Windows for conversion in Linux. 

I don't reproduce. You should attach page 2 as opened in Linux.
Comment 4 viewer 2022-06-10 21:00:26 UTC
Created attachment 180671 [details]
PDF created from Word document viewed in LibreOffice on Linux, then exported to PDF
Comment 5 viewer 2022-06-10 21:04:29 UTC
Created attachment 180672 [details]
Screenshot of Word docx viewed in Libre Office on Linux

My apologies for not providing all the details. 

I've attached the screenshot of the file as viewed in Linux and the text wrapping is noticeable there. (Earlier we were using headless LibreOffice for convert-to). So I believe that makes it a FILEOPEN issue? The font used is Trebuchet. This sounds similar to Bug 62422.

Would you like me to create separate issues for the Page number not resolving and the missing page number in table of contents?
Comment 6 QA Administrators 2022-06-11 03:30:21 UTC Comment hidden (obsolete)
Comment 7 Timur 2022-06-11 07:15:49 UTC
I don't see well in screenshot but key question is if MS Trebuchet in font field is italic, meaning not read by LO. 
If so as I suspect, this is NotABug, You can set replacement font in Options-Fonts.

(Actually, but may be about not reading embedded font if so, but there's already bug for that.)
Comment 8 Timur 2022-06-11 07:30:32 UTC
In general attachment 167165 [details] shows a personal LO replacement table. 
But in this case probably is easier to install Microsoft fonts in Linux.
Comment 9 viewer 2022-06-13 17:53:44 UTC
Thank you for the feedback. MS Trebuchet is shown in italic in LO on Linux. We will  research your suggestion about setting the font replacement in LO. Unfortunately, installing MS fonts is not an option for us.
Comment 10 Timur 2022-06-14 07:06:02 UTC
I mark NotABug. 
We now don't even know what substitute is, bug 61134. 
With font replacement, beware of bug 43185.
I see a number of proposals, as https://graphicdesign.stackexchange.com/questions/21969/trebuchet-ms-web-font-alternatives.
Comment 11 Timur 2022-06-14 07:24:48 UTC
You can also try to embed fonts in a file. Embedding has it's own bugs.