Bug 88752 - DOC DOCX import: text grid is (wrongly?) applied to table thus the page content flow is not the same as in MS Word
Summary: DOC DOCX import: text grid is (wrongly?) applied to table thus the page conte...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:doc, filter:docx
Depends on:
Blocks: CJK DOC Text-Grid
  Show dependency treegraph
 
Reported: 2015-01-23 16:03 UTC by Andras Timar
Modified: 2022-11-17 12:25 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
bugdoc (DOC) (45.00 KB, application/msword)
2015-01-23 16:03 UTC, Andras Timar
Details
amended bugdoc in docx format (17.71 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2022-11-11 02:13 UTC, Kevin Suo
Details
amended bugdoc exported to pdf in MS Word (135.28 KB, application/pdf)
2022-11-11 02:13 UTC, Kevin Suo
Details
amended bugdoc exported to pdf in Writer (24.96 KB, application/pdf)
2022-11-11 02:14 UTC, Kevin Suo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Andras Timar 2015-01-23 16:03:48 UTC
Created attachment 112730 [details]
bugdoc (DOC)

The attached document has a table, which should fit in the first column of page 1 (as in Word). It does not fit, because it has text grid switched on, and line height of text grid is bigger in Writer that it is in Word. 

I found the relevant code location, but I do not know how to interpret values from the doc better. nLinePitch*=0.9 worked well, but...

http://opengrok.libreoffice.org/xref/core/sw/source/filter/ww8/ww8par6.cxx#210
Comment 1 Andras Timar 2015-01-23 16:05:17 UTC
@Caolán: could you please advise?
Comment 2 Caolán McNamara 2015-01-26 13:50:49 UTC
Back in the day when we first implemented the text grid I suggested we should have the same text grid rules as word (whatever they are) for ease of interoperability, but the powers that be said they wanted a text-grid that was "better" than words and so we have different rules for it, which is a frustrating nightmare.

The table confuses things, but if you remove it and compare word against writer for 15 lines per page (writer: format page->text grid with cjk features enabled word: page setup->document grid) and put 15 lines into word and writer for this doc then word fits 15 lines into the page and writer fits 14.

I remember just giving up and settings things so that the numbers are the same in word and writer wrt lines per page and base text size. I don't think it's as simple as just scaling the font size by 0.9 (but maybe it is) https://wiki.openoffice.org/w/images/1/1c/Text_Grid_Enhancement_for_CJK.odt which has some details on what's wrong with our grid that *might* be useful.

You probably need to find the code in writer that calculates the grid and see how it calculates it in order to feed it numbers that will make it behave like word. It might be that to get the same layout as word you have to add/subtract page margins and/or header/footer heights or something like that.
Comment 3 QA Administrators 2016-09-20 09:37:14 UTC Comment hidden (obsolete)
Comment 4 Gerald Pfeifer 2017-11-25 16:32:01 UTC Comment hidden (obsolete)
Comment 5 QA Administrators 2018-11-26 03:36:15 UTC Comment hidden (obsolete)
Comment 6 Gerald Pfeifer 2018-11-26 07:19:11 UTC Comment hidden (obsolete)
Comment 7 QA Administrators 2019-11-27 03:45:15 UTC Comment hidden (obsolete)
Comment 8 Gerald Pfeifer 2019-11-27 10:50:53 UTC
Yes, this is still there.

Version: 6.5.0.0.alpha0+
Build ID: 122468bf97f1ea456274991103a13489b8d5df58
CPU threads: 4; OS: Linux 5.3; UI render: default; VCL: gtk3; 
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2019-11-22_13:38:19
Comment 9 QA Administrators 2021-11-27 04:31:07 UTC Comment hidden (obsolete)
Comment 10 Collabora Productivity Ltd 2021-11-27 07:36:09 UTC
Yes, the bug is there.
Version: 7.2.3.1 / LibreOffice Community
Build ID: 20(Build:1)
CPU threads: 12; OS: Linux 5.3; UI render: default; VCL: kf5 (cairo+xcb)
Locale: hu-HU (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 11 Kevin Suo 2022-11-10 06:36:49 UTC
(In reply to Caolán McNamara from comment #2)
> You probably need to find the code in writer that calculates the grid and see how it calculates it in order to feed it numbers that will make it behave like word.

The relevant code should be in:
sw/source/ui/misc/pggrid.cxx
https://opengrok.libreoffice.org/xref/core/sw/source/ui/misc/pggrid.cxx?r=875c27dc

People interested on this can take a try. 

> It might be that to get the same layout as word you have to add/subtract page margins and/or header/footer heights or something like that.

SwTextGridPage::UpdatePageSize tries to do this kind of thing, but not sure whether it did it correct.

When I open this doc in MS Word, it has Text Grid turned on, grid type is Lines Only, 29 lines per page, 18pt max base text size, which is the same as what shows in Writer's Format > Page > Text Grid dialog.
Comment 12 Kevin Suo 2022-11-11 02:13:02 UTC
Created attachment 183529 [details]
amended bugdoc in docx format
Comment 13 Kevin Suo 2022-11-11 02:13:43 UTC
Created attachment 183530 [details]
amended bugdoc exported to pdf in MS Word
Comment 14 Kevin Suo 2022-11-11 02:14:07 UTC
Created attachment 183531 [details]
amended bugdoc exported to pdf in Writer
Comment 15 Kevin Suo 2022-11-11 02:19:14 UTC
The problem seems to be the (wrongly) applying of text grid for text in table. 

From the pdf exported from MS Word, we can see that the text grid at not used for the table (i.e., the line height is single line without any adjustments). However, from the PDF exported in Writer, you see that the text in table is adjusted the same way as for normal text paragraphs applying the "lines per page" setting as defined on the Format > Page > Text Grid dialog.

Not sure whether it is explained in OOXML standards that text grid should not be applied to tables? And how about text frames?
Comment 16 Kevin Suo 2022-11-11 02:26:50 UTC
Revised the summary field accordingly.
Comment 17 Kevin Suo 2022-11-17 12:25:25 UTC
Not sure whether the following spec is a clue:

http://officeopenxml.com/WPstyles.php

Hierarchy

Styles are applied in the following order.

    * Document defaults are applied first. Document defaults are defined with the <w:docDefaults> element, which is a child of <w:styles>. That is, it is at the same level as style definitions.
    * Next, table styles are applied.
    * Next, numbering styles are applied.
    * Next, paragraph and run styles are applied as defined in paragraph styles. (A <w:pPr> can contain an <w:rPr>)
    * Next, run properties are applied.
    * Finally, direct formatting is applied.

I.e table styles are applied first in OOXML.