Bug 153042 - FILEOPEN: DOCX list numbering: Incorrect indent of first line (comment 19)
Summary: FILEOPEN: DOCX list numbering: Incorrect indent of first line (comment 19)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.4.1.2 release
Hardware: All All
: low minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:doc, filter:docx, regression
Depends on:
Blocks: Paragraph-Indent
  Show dependency treegraph
 
Reported: 2023-01-16 11:49 UTC by Xisco Faulí
Modified: 2023-05-23 13:25 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Comparsion MSO vs LibreOffice 7.6 master (451.50 KB, image/png)
2023-01-16 11:49 UTC, Xisco Faulí
Details
tdf153042_1.doc: clean-room, minimal example (20.50 KB, application/msword)
2023-01-16 15:45 UTC, Justin L
Details
tdf153042_2.doc: negative indent. Intersting - LO can't do that at all. (24.00 KB, application/msword)
2023-01-16 15:59 UTC, Justin L
Details
tdf153042_2-MSWord 2003.pdf (14.86 KB, application/pdf)
2023-01-16 16:00 UTC, Justin L
Details
tdf153042_3.doc: a negative indent, with bullet remnants and tabstop at zero. (29.50 KB, application/msword)
2023-01-16 16:15 UTC, Justin L
Details
tdf153042_4.docx: negative indent, but functional tab to zero position (7.19 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-01-16 16:46 UTC, Justin L
Details
tdf153042_1.docx: DOCX version - when all aligned at zero, no tabstop used in MS Word. (12.16 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-01-16 17:00 UTC, Justin L
Details
tdf153042_5.docx: not just the zero case - anytime align matches tabstop (16.63 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-01-20 16:08 UTC, Justin L
Details
tdf153042_7.docx: Word 2019 shows big indent (with tab), Word 2003/2010 show small indent. (16.72 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-02-06 15:57 UTC, Justin L
Details
tdf153042_7b.docx: paragraph settings viewed, then OK'ed (14.21 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2023-02-09 19:37 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Xisco Faulí 2023-01-16 11:49:02 UTC
Created attachment 184682 [details]
Comparsion MSO vs LibreOffice 7.6 master

Steps to reproduce:
1. Open attachment 47487 [details] from bug 37888
2. Go to page 2

-> Text in red has incorrect indent

Reproduced in

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: b9411e587586750f36ba9009b5f1e29fe461d8b5
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: de-DE (es_ES.UTF-8); UI: en-US
Calc: threaded

[Bug found by office-interoperability-tools]
Comment 1 Xisco Faulí 2023-01-16 11:49:57 UTC
Regression introduced by:

author	Justin Luth <jluth@mail.com>	2022-08-11 09:29:58 -0400
committer	Justin Luth <jluth@mail.com>	2022-08-11 19:38:29 +0200
commit 2405a36f3bcd43f80371ccaed47f7523ff0d8757 (patch)
tree 77e6b5cf0cfffbc511daf743ff60bf15f8c22063
parent eca3ce35fe9a346965a32f42d02cb6d3f5a3982f (diff)
tdf#148360 doc import: add NO_NUMBERING_SHOW_FOLLOWBY(true)

Bisected with: bibisect-linux64-7.5

Adding Cc: to Justin Luth
Comment 2 Xisco Faulí 2023-01-16 12:16:04 UTC
Also reproduced in attachment 67101 [details] from bug 54862 with lines

-   Αίτηση ενδιαφερόμενου
-   Ατομική επαγγελματική άδεια αλιείας και άδεια αλιείας σκάφους
Comment 3 Justin L 2023-01-16 15:38:04 UTC
The change in general is correct. The problem is that these are defined with a "tabstop at 0", and LO is not handling that special case. Likely it becomes a lot more complicated if there is a hanging indent, or negative margin etc...
Comment 4 Justin L 2023-01-16 15:45:43 UTC
Created attachment 184689 [details]
tdf153042_1.doc: clean-room, minimal example

-assigned bullets to heading 2.
-applied heading2 to paragraph
-removed bullets from heading 2, but numbering properties are still in place.
Comment 5 Justin L 2023-01-16 15:59:13 UTC
Created attachment 184690 [details]
tdf153042_2.doc: negative indent. Intersting - LO can't do that at all.
Comment 6 Justin L 2023-01-16 16:00:58 UTC
Created attachment 184691 [details]
tdf153042_2-MSWord 2003.pdf
Comment 7 Justin L 2023-01-16 16:15:00 UTC
Created attachment 184692 [details]
tdf153042_3.doc: a negative indent, with bullet remnants and tabstop at zero.

It is difficult to create a NONE numbering with a non-zero tabstop in MS Word 2003 - which is what the other patch was fixing. This tabstop-at-zero is more common. However, MS Word doesn't actually seem to listen to that instruction in this case where there is a negative indent. (That seems a bit inconsistent to me - it does honour it if it is non-zero.)

So it seems like this is a special-case situation.
Comment 8 Justin L 2023-01-16 16:46:37 UTC
Created attachment 184694 [details]
tdf153042_4.docx: negative indent, but functional tab to zero position

I hand-modified a docx to make this example. The UI doesn't easily allow it. (Afterwards, I noticed that "outline" in MS Word probably allows you to create this via the UI.) A DOC version of this works the same way.

This loads OK in LO, but the default suffix is changed to <w:suff w:val="nothing"/> on a round-trip (for both DOCX and DOC).
Comment 9 Justin L 2023-01-16 16:49:22 UTC
In comment 8 I noticed that this zero-tab is not exported. So the problem (in these particular documents anyway) is "fixed" by a simple round-trip. So I am lowering the importance.

As these various minimal documents are created, it clearly shows that this wasn't so much a regression as lots of edge cases that were not already covered.
Comment 10 Justin L 2023-01-16 17:00:28 UTC
Created attachment 184696 [details]
tdf153042_1.docx: DOCX version - when all aligned at zero, no tabstop used in MS Word.

This in indeed a trivial, but fundamental difference in MS Word. A numbered list (using numbering of type NONE) uses a zero-width tab to jump to position 0 from a starting point of zero in MS Word.

In LO, the tabstop jumps from position zero to the next tabstop position - which is also consistent with how normal paragraphs work. I like LO implementation better.
Comment 11 Justin L 2023-01-16 20:20:33 UTC
Likely this should be solved by a layout exception - where a numbering tab at position zero with indent/margin/whatever all at zero should be zero width.
Comment 12 Justin L 2023-01-20 16:08:11 UTC
Created attachment 184793 [details]
tdf153042_5.docx: not just the zero case - anytime align matches tabstop
Comment 13 Justin L 2023-02-02 18:55:33 UTC
To fix this bug, I have patch https://gerrit.libreoffice.org/c/core/+/145915.
However, I want to make sure that we don't have any compat15 implications here, so these documents need testing with a modern version of MS Office.
Comment 14 Justin L 2023-02-06 15:49:55 UTC
(In reply to Justin L from comment #8)
> Created attachment 184694 [details]
> tdf153042_4.docx: negative indent, but functional tab to zero position

This is an interesting case (as is tdf106953.docx), where Word 2003/2010 display it with a tab, but Word 2019 does not (in either the compat15 or non-compat15 mode). So there are situations here where Word simply can be incompatible with itself. Lovely.
Comment 15 Justin L 2023-02-06 15:57:45 UTC
Created attachment 185157 [details]
tdf153042_7.docx: Word 2019 shows big indent (with tab), Word 2003/2010 show small indent.

This document seems to be the opposite case, where Word 2019 shows a tab/big indent, but older versions don't show a tab/small indent.
Comment 16 Justin L 2023-02-06 20:38:38 UTC
OK - I think the main problem here is that we do no distinguish between documents that do not have a tabStop defined (i.e. default of zero), or those that do have it defined as zero.

tdf148360.docx is probably the poster child for this aspect of the bug, although it seems to be slightly corrupt, since things change when hitting OK after viewing the settings. It has no tabstop defined, and thus a default tabstop should be activated.


If the tabstop is defined, and it is the same as the first line indent position, then the tabstop should just be swallowed up. The tdf153042_X documents are examples of this.


sampl1.doc gets numbering via the Heading 3 style, but has direct paragraph formatting (Heading 3 + Left:0" that overrides the numbering I guess. (Changing the paragraph format to just "Heading 3" indents similar to LO.

tdf106953.docx seems to be slightly corrupt, since viewing the settings and hitting OK changes things. There is a tabstop defined at .25 inch, but regular paragraph indent of 0.5. In Word it looks like the 0.5 is winning out, but when pressing OK it looks like it changes to the .25 indent.
Comment 17 Justin L 2023-02-08 19:56:26 UTC
I am reverting the change indicated by comment 1, since the compat option is missing critical features. DOC support can be added back in once it is stable for DOCX/RTF. Until then, let's not break so many existing situations - since they seem to abound in DOC format.

The same problem indicated in comment 0 applies to DOCX and RTF files. Examples have already been attached, and existing unit tests that are affected are documented in https://gerrit.libreoffice.org/c/core/+/145915.
Comment 18 Justin L 2023-02-09 19:37:59 UTC
Created attachment 185270 [details]
tdf153042_7b.docx: paragraph settings viewed, then OK'ed

(In reply to Justin L from comment #15)
Wow - this is amazing. At the moment we import these two documents almost perfectly (according to MS Word 2019 anyway).
The UI paragraph and outline settings between 7 and 7b look identical,
and yet one imports with a tabstop and 7b doesn't.

I created 7b in MS Word 2019. I noticed that the layout changed if I viewed the paragraph settings and then hit OK.  (That is what was saved as 7b.)
By going to the outline numbering settings and hitting OK, you can get back to 7(a)'s layout.
Comment 19 Commit Notification 2023-02-09 23:15:51 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ff3440535e786c73237176670372c565ca3421b4

tdf#153042 doc/x/rtf: pre-emptive unit tests for numbering tabstop

It will be available in 7.6.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Justin L 2023-02-10 16:35:25 UTC
At this point, this bug report is referring to DOCX and RTF only - since the DOC aspect has been reverted.

So this is a 7.4.1 regression based on Vasily's 7.5 master commit a7d9837a8aa6d1233f4c21e4db5d32428a3ffc58.

In general, his patch is correct, but there are many obscure combinations of unspecified values and direct formatting that affect this.

At this point the export is dropping the tab anyway, so perhaps it would be sane to revert Vasily's work altogether - since it only lasts for a single import anyway.

In any case, I think I am done here.