Bug 115944 - Render differences of DOCX between Word and LibreOffice with Palatino and Tekton fonts
Summary: Render differences of DOCX between Word and LibreOffice with Palatino and Tek...
Status: RESOLVED INVALID
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
5.4.3.2 release
Hardware: x86-64 (AMD64) Mac OS X (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:docx
Depends on:
Blocks: Font-Substitution
  Show dependency treegraph
 
Reported: 2018-02-22 20:10 UTC by Jens Troeger
Modified: 2018-08-13 00:57 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Shows the render differences of a test document between Word and LO. (311.51 KB, image/jpeg)
2018-02-22 20:12 UTC, Jens Troeger
Details
Test document to reproduce the problem. (3.59 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-02-22 20:13 UTC, Jens Troeger
Details
Shows the render differences of the test document between Word/DOCX, LO/DOCX, LO/ODT. (656.77 KB, image/jpeg)
2018-02-23 22:08 UTC, Jens Troeger
Details
DOCX test document to reproduce the problem. (3.59 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-02-23 22:09 UTC, Jens Troeger
Details
ODT test document (saved from Word) to reproduce the problem. (3.57 MB, application/vnd.oasis.opendocument.text)
2018-02-23 22:10 UTC, Jens Troeger
Details
OO seems to remove spaces from text when rendering a DOCX. (394.76 KB, image/jpeg)
2018-07-30 11:56 UTC, Jens Troeger
Details
Shape objects (grouped text and gfx objects) don’t render at all. (116.75 KB, image/jpeg)
2018-07-30 12:01 UTC, Jens Troeger
Details
Comparison of heading Tekton Pro Bold with DOCX and ODT in LO (27.86 KB, image/png)
2018-08-06 09:23 UTC, Alex Thurgood
Details
Comparison of image spacing and organisation between DOCX and ODT (217.61 KB, image/png)
2018-08-06 09:25 UTC, Alex Thurgood
Details
Comparison of footnote and page reference render in DOCX and ODT (103.43 KB, image/png)
2018-08-06 09:27 UTC, Alex Thurgood
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jens Troeger 2018-02-22 20:10:38 UTC
Description:
It looks to me as if word spacing and character kerning are slightly different for the same document rendered by Word vs LO, resulting in a cumulative “error” that causes different numbers of words per line.  That shows in the footnote which stretches footnote’s box, thus causing the new paragraph to be pushed to the next page (Word) or not (LibreOffice).

The net effect of this ripples through the entire document, resulting in different page counts and paragraphs being placed on different pages. For example, a document I work with contains chapter 41 in Word on page 195 (effective 232 of 427) whereas that same chapter 41 in LibreOffice is on 183 (effective 212 of 435).

This, in turn, becomes a problem if the author types out page numbers instead of using internal references: the pages numbers in the text then represent in no way the page numbers of the document.

Steps to Reproduce:
Load the attached test document in Word and LO.

Actual Results:  
Paragraphs rendered on different pages.

Expected Results:
Paragraphs rendered on same pages.


Reproducible: Always


User Profile Reset: No



Additional Info:
See also: http://nabble.documentfoundation.org/Render-differences-of-DOCX-between-Word-and-LO-AOO-td4231236.html


User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36
Comment 1 Jens Troeger 2018-02-22 20:12:04 UTC
Created attachment 140065 [details]
Shows the render differences of a test document between Word and LO.
Comment 2 Jens Troeger 2018-02-22 20:13:22 UTC
Created attachment 140066 [details]
Test document to reproduce the problem.
Comment 3 Alex Thurgood 2018-02-23 08:32:08 UTC
Confirming with

Version: 6.0.1.1
Build ID: 60bfb1526849283ce2491346ed2aa51c465abfe6
Threads CPU : 4; OS : Mac OS X 10.13.3; UI Render : par défaut; 
Locale : fr-FR (fr_FR.UTF-8); Calc: group

Differences to Word :


- Title is broken, partly subscripted or shifted down half a line compared to Word
- Spacing around image is incorrect compared to Word
- Text wrap around the bottom of image is incorrect compared to Word
- Footnote numbering is not superscripted compared to Word
- the final paragraph commences at the bottom of page 1, whereas in Word it starts on page 2
- there is an error displayed in the source reference to page 353 in the final paragraph
Comment 4 Timur 2018-02-23 12:55:17 UTC
This shouldn't have been confirmed so fast. 
Bugzilla is not "document based", like "this document doesn't display nice". 
Bugzilla is "issue based", so a single issue must be pointed at, after a search for not being a duplicate. 
It's highly unlikely that any bugs of type "multiple problems with this file/bad rendering/this should look like", will be fixed. 
Each issue (section break, paragraph break, text box size, picture position...) should be analyzed and checked for already reported bugs.
If bugs don't exist, they should be reported separately, even if they happen with the same file. 
Example file should be reduced to minimum test case for a specific problem, with clear file history or steps to reproduce from scratch
Comment 5 Timur 2018-02-23 13:25:24 UTC
If we choose a single issue, this bug should be about footnote. 
But footnote font in MSO is "Palatino" and it's substituted in LO. From what I see, it is supplied in Mac. But I don't have it in Windows. So, wnot clear what would be the bug here. Better replacement? See Bug 64509.
Title is "Tekton Pro Bold", not so nicely subsituted in LO. See Bug 61134.
Comment 6 V Stuart Foote 2018-02-23 14:23:58 UTC
(In reply to Timur from comment #5)
> If we choose a single issue, this bug should be about footnote. 
> But footnote font in MSO is "Palatino" and it's substituted in LO. From what
> I see, it is supplied in Mac. But I don't have it in Windows. So, wnot clear
> what would be the bug here. Better replacement? See Bug 64509.
> Title is "Tekton Pro Bold", not so nicely subsituted in LO. See Bug 61134.

Not clear here if the issue is on macOS, or when crossing between macOS and MS Windows system. Assuming it is on the same macOS system, then same fonts are available on that system and our ww8 import filter is incorrectly doing a font replacement (we read the font assignment incorrectly in macOS's ww8 import filter)--or there is no font substitution and our handling of font metrics differs from MS Word (possible with move to HarfBuzz).

I'd suspect the second--but we'd need to know if there is a font replacement occurring (i.e. bug 61134). Unfortunately we'd only see the font name name for the style (text body/default or footnote) is showing italicized with tooltip font is not available.

Q1--is this only on the same macOS system?

Q2--is the font being replaced with fallback incorrectly or not?
Comment 7 Jens Troeger 2018-02-23 20:32:08 UTC
Thank you all for the discussion so far.

To clarify: this happens on the same system for the same user, where all required fonts are available to that user. MS Word loads and renders the Tekton font, whereas LO does not.

But as pointed out in comment 3 above, that is but one of several issues that go awry with this document.
Comment 8 V Stuart Foote 2018-02-23 20:58:15 UTC
Well here is an additional piece of NEEDINFO

Word 2016 OOXML is not the same format as ODF and we do filter conversion between the formats.

Since your macOS Word 2016 will save to ODF (and so handling the footer and header conversion and font assignment), how does the comparison of page formatting across the doucment compare working in MS Word Generated ODF?

From Writer you would Save-as -> "OpenDocument Text" .odt--and then compare layout of that document between original on Word 2016 and a current LibreOffice Writer release (5.4.5.1 or 6.0.2.1)
Comment 9 Jens Troeger 2018-02-23 22:08:00 UTC
Created attachment 140101 [details]
Shows the render differences of the test document between Word/DOCX, LO/DOCX, LO/ODT.
Comment 10 Jens Troeger 2018-02-23 22:09:21 UTC
Created attachment 140102 [details]
DOCX test document to reproduce the problem.

The original Word DOCX file.
Comment 11 Jens Troeger 2018-02-23 22:10:24 UTC
Created attachment 140103 [details]
ODT test document (saved from Word) to reproduce the problem.
Comment 12 Jens Troeger 2018-02-23 22:13:34 UTC
I have updated the test documents and added the ODT saved with Word. I've also updated the screen shots to compare the different ways the original Word DOCX file is being rendered.

I noticed though, that the test document did not use the Tekton Pro font in Word either although it's set to use it. I've fixed that, as you can see in the screenshots.

Still, plenty of differences around the footnote container and paragraphs...
Comment 13 Jens Troeger 2018-07-30 11:55:10 UTC
Was there any progress on this issue? I’ve just tried LO 6.0.5.2 and it still shows great differences in object rendering.

Worse, spaces seem to be removed from the text; spaces which _do_ exist in the original DOCX file. See the attached screenshot.

Shape rendering is unfaithful to broken as well, see second attached screenshot.

My question is: what’s the priority on investigating these issues, and fixing them?
Comment 14 Jens Troeger 2018-07-30 11:56:46 UTC
Created attachment 143824 [details]
OO seems to remove spaces from text when rendering a DOCX.

The spaces are actually missing when I look at the string representation of the object in the DOM (using MRI inspector).
Comment 15 Jens Troeger 2018-07-30 12:01:54 UTC
Created attachment 143826 [details]
Shape objects (grouped text and gfx objects) don’t render at all.
Comment 16 Alex Thurgood 2018-08-06 09:18:22 UTC
I tested Jens' initial Word DOCX document on MacOS 10.13.6 as follows:

1) Opened the DOCX in Word (16.15 - 180709).
2) Saved as ODT.

3) Opened up both the DOCX file and the newly saved ODT file from Word in LO


Version: 6.2.0.0.alpha0+
Build ID: 36e1f6ebf0c74b4b90bbf1aab8d9ab69b8746f3a
CPU threads: 4; OS: Mac OS X 10.13.6; UI render: default; 
Locale: fr-FR (fr_FR.UTF-8); Calc: group threaded

4) The footnote font for both documents is given as Palatino Regular 10.
5) The display and positioning of the footnote between the two documents appear to be identical. A screenshot is enclosed showing the comparison.

6) Note however, that word spacing / kerning do indeed appear to be different in the ODT document produced by Word, whilst the DOCX document opened in LO appears to be almost identical to the native document opened in Word.

7) WW8 Import in LO

Currently, the only differences I see in the Word DOCX document opened in LO is:

(a) the title, which is broken along the line, and where the font Tekton Pro Bold appears in italics in the Font menu, indicating that it has been substituted, and the tooltip also indicates this ;

(b) the lack of a page reference (an error message is provided instead Error:reference source not found) on page 2. 

8) ODT Export from Word 
Here, the spacing around the image object and the organisation of the image object have not been respected on export from Word. In general, the paragraph spacing in the ODT document is also incorrect, as the third paragraph starts on the first page instead of on page 2. If it is due to a Word export problem, then this part is NOTOURBUG.


@Jens - please choose only one issue from the above point (7), you should open up a second bug report for the other issue.

If you choose the Tekton Pro font substitution issue, then this will probably be closed as a DUPLICATE of bug 61134.

If you choose the page reference error issue, then please correct the title of your report to reflect that.
Comment 17 Alex Thurgood 2018-08-06 09:23:35 UTC
Created attachment 143988 [details]
Comparison of heading Tekton Pro Bold with DOCX and ODT in LO

Comparison of the rendering of the header which uses Tekton Pro Bold in both the DOCX and the MSWord-exported ODT when opened in LO.
Comment 18 Alex Thurgood 2018-08-06 09:25:53 UTC
Created attachment 143990 [details]
Comparison of image spacing and organisation between DOCX and ODT

Comparison of image placement and spacing in DOCX and MSWord-exported ODT when opened in LO
Comment 19 Alex Thurgood 2018-08-06 09:27:52 UTC
Created attachment 143991 [details]
Comparison of footnote and page reference render in DOCX and ODT

Comparison of footnote and page reference render in DOCX and MSWord-exported ODT when opened in LO
Comment 20 Alex Thurgood 2018-08-06 09:28:40 UTC
In the screenshots I have provided, the lefthand side is the DOCX file, the righthand side is the MSWord-exported ODT file.
Comment 21 Jens Troeger 2018-08-07 17:05:32 UTC
Thanks Alex for your comment (https://bugs.documentfoundation.org/show_bug.cgi?id=115944#c16)

A few remarks and questions though.

I see much mention of your points 1) and 2) which puzzles me.  Such an approach is not really an option in my case because customers give me a DOCX file which enters an automated process—there is no way in that automation to use MS Word for ODT conversion.  Moreover, the ODT conversion in Word sometimes fails just as much as LO fails to read DOCX, so there doesn’t seem to be a win.

4) and 5) look like they have improved since I’ve last checked with LO 5.xx in February. However, my Word version 16.16 still renders the footnote in three lines (see the initial screen shot).

6) Do I understand correctly that LO reads a DOCX file by running it through an internal DOCX → ODT filter, and then renders that ODT?

7) Comparing DOCX in Word 16.16 with that same DOCX in LO 6.0.5.2 still shows different text flow around the image which causes different paragraph layout.  For a document with several hundred pages, this cumulative error can be quite significant!  Are you able to reproduce that difference with the DOCX attached to this bug report?

8) I would agree, and I have already filed a bug report with Word.

@Alex, what are your thoughts on the images I uploaded on 2018-07-30?

Thanks!
Comment 22 Alex Thurgood 2018-08-08 07:58:40 UTC
@Jens : I'm not a developer, but my understanding is that the DOCX filter converts the contents of the file to data structures held in memory which can be drawn within the LO application on screen (at least, that was the way they used to work, unless something has changed and I haven't followed it). Where something isn't mapped correctly or with a suitable equivalent is where the filter, and corresponding display in LibreOffice, tend to fall down and discrepancies occur.

The filter isn't perfect, it was after all, reverse engineered, but improvements are made to it on a regular basis, and of course, sometimes changes to the code are involuntarily introduced which cause bugs.

The way that bugzilla functions within the LO project is in the identification abd breakdown of individual problems that are reproducible, for example, with regard to your particular document:

- font substitution for Tekton Pro Bold

- text flow around image object

- kerning and spacing differences between Word and LibreOffice on MacOS - LibreOffice has recently-ish implemented harfbuzz library for opentype font rendering, which has caused a number of already reported kerning and spacing issues (and corresponding display incompatibilities) with certain fonts, not just on MacOS, but also between OSes.

To my knowledge, MSWord on MacOS doesn't use harfbuzz. This in and of itself is likely to cause visible differences in rendering when Word and LibreOffice are compared side-by-side, absent any bugs in the DOCX import filter.

I can sympathize with the issues you envisage, with multi-page documents containing images not displaying in the same way between Word and LibreOffice for a given DOCX file, as I encounter this myself with regard to my own clients, but improving that situation is an ongoing process which relies on identification of a specific, reproducible problem, rather than a general "my document doesn't look right" approach.
Comment 23 Alex Thurgood 2018-08-08 08:01:08 UTC
@Jens : to summarize, my suggestion would be to close this issue, as it has become somewhat unclear as to what the specific problem is, and open new issues for each clearly identified reproducible problem. 

This also makes it easier for QA volunteers like myself to confirm, and look for duplicates.
Comment 24 Jens Troeger 2018-08-12 21:15:30 UTC
Thanks Alex.

I am closing this because I think this can not be fixed considering that LO now uses harfbuzz as the text layout engine. However, I filed the following follow-up bug: https://bugs.documentfoundation.org/show_bug.cgi?id=119234
Comment 25 Jens Troeger 2018-08-12 21:16:48 UTC
Actually, how do I close as “unresolved” or “won’t fix” ;-)
Comment 26 V Stuart Foote 2018-08-13 00:57:57 UTC
Coretext (Word) vs Harfbuzz (LO)-- will never render the same.