Created attachment 184682 [details]
Comparsion MSO vs LibreOffice 7.6 master
Steps to reproduce:
1. Open attachment 47487 [details] from bug 37888
2. Go to page 2
-> Text in red has incorrect indent
Version: 22.214.171.124.alpha0+ (X86_64) / LibreOffice Community
Build ID: b9411e587586750f36ba9009b5f1e29fe461d8b5
CPU threads: 8; OS: Linux 5.10; UI render: default; VCL: gtk3
Locale: de-DE (es_ES.UTF-8); UI: en-US
[Bug found by office-interoperability-tools]
Regression introduced by:
author Justin Luth <email@example.com> 2022-08-11 09:29:58 -0400
committer Justin Luth <firstname.lastname@example.org> 2022-08-11 19:38:29 +0200
commit 2405a36f3bcd43f80371ccaed47f7523ff0d8757 (patch)
parent eca3ce35fe9a346965a32f42d02cb6d3f5a3982f (diff)
tdf#148360 doc import: add NO_NUMBERING_SHOW_FOLLOWBY(true)
Bisected with: bibisect-linux64-7.5
Adding Cc: to Justin Luth
Also reproduced in attachment 67101 [details] from bug 54862 with lines
- Αίτηση ενδιαφερόμενου
- Ατομική επαγγελματική άδεια αλιείας και άδεια αλιείας σκάφους
The change in general is correct. The problem is that these are defined with a "tabstop at 0", and LO is not handling that special case. Likely it becomes a lot more complicated if there is a hanging indent, or negative margin etc...
Created attachment 184689 [details]
tdf153042_1.doc: clean-room, minimal example
-assigned bullets to heading 2.
-applied heading2 to paragraph
-removed bullets from heading 2, but numbering properties are still in place.
Created attachment 184690 [details]
tdf153042_2.doc: negative indent. Intersting - LO can't do that at all.
Created attachment 184691 [details]
Created attachment 184692 [details]
tdf153042_3.doc: a negative indent, with bullet remnants and tabstop at zero.
It is difficult to create a NONE numbering with a non-zero tabstop in MS Word 2003 - which is what the other patch was fixing. This tabstop-at-zero is more common. However, MS Word doesn't actually seem to listen to that instruction in this case where there is a negative indent. (That seems a bit inconsistent to me - it does honour it if it is non-zero.)
So it seems like this is a special-case situation.
Created attachment 184694 [details]
tdf153042_4.docx: negative indent, but functional tab to zero position
I hand-modified a docx to make this example. The UI doesn't easily allow it. (Afterwards, I noticed that "outline" in MS Word probably allows you to create this via the UI.) A DOC version of this works the same way.
This loads OK in LO, but the default suffix is changed to <w:suff w:val="nothing"/> on a round-trip (for both DOCX and DOC).
In comment 8 I noticed that this zero-tab is not exported. So the problem (in these particular documents anyway) is "fixed" by a simple round-trip. So I am lowering the importance.
As these various minimal documents are created, it clearly shows that this wasn't so much a regression as lots of edge cases that were not already covered.
Created attachment 184696 [details]
tdf153042_1.docx: DOCX version - when all aligned at zero, no tabstop used in MS Word.
This in indeed a trivial, but fundamental difference in MS Word. A numbered list (using numbering of type NONE) uses a zero-width tab to jump to position 0 from a starting point of zero in MS Word.
In LO, the tabstop jumps from position zero to the next tabstop position - which is also consistent with how normal paragraphs work. I like LO implementation better.
Likely this should be solved by a layout exception - where a numbering tab at position zero with indent/margin/whatever all at zero should be zero width.
Created attachment 184793 [details]
tdf153042_5.docx: not just the zero case - anytime align matches tabstop
To fix this bug, I have patch https://gerrit.libreoffice.org/c/core/+/145915.
However, I want to make sure that we don't have any compat15 implications here, so these documents need testing with a modern version of MS Office.
(In reply to Justin L from comment #8)
> Created attachment 184694 [details]
> tdf153042_4.docx: negative indent, but functional tab to zero position
This is an interesting case (as is tdf106953.docx), where Word 2003/2010 display it with a tab, but Word 2019 does not (in either the compat15 or non-compat15 mode). So there are situations here where Word simply can be incompatible with itself. Lovely.
Created attachment 185157 [details]
tdf153042_7.docx: Word 2019 shows big indent (with tab), Word 2003/2010 show small indent.
This document seems to be the opposite case, where Word 2019 shows a tab/big indent, but older versions don't show a tab/small indent.
OK - I think the main problem here is that we do no distinguish between documents that do not have a tabStop defined (i.e. default of zero), or those that do have it defined as zero.
tdf148360.docx is probably the poster child for this aspect of the bug, although it seems to be slightly corrupt, since things change when hitting OK after viewing the settings. It has no tabstop defined, and thus a default tabstop should be activated.
If the tabstop is defined, and it is the same as the first line indent position, then the tabstop should just be swallowed up. The tdf153042_X documents are examples of this.
sampl1.doc gets numbering via the Heading 3 style, but has direct paragraph formatting (Heading 3 + Left:0" that overrides the numbering I guess. (Changing the paragraph format to just "Heading 3" indents similar to LO.
tdf106953.docx seems to be slightly corrupt, since viewing the settings and hitting OK changes things. There is a tabstop defined at .25 inch, but regular paragraph indent of 0.5. In Word it looks like the 0.5 is winning out, but when pressing OK it looks like it changes to the .25 indent.
I am reverting the change indicated by comment 1, since the compat option is missing critical features. DOC support can be added back in once it is stable for DOCX/RTF. Until then, let's not break so many existing situations - since they seem to abound in DOC format.
The same problem indicated in comment 0 applies to DOCX and RTF files. Examples have already been attached, and existing unit tests that are affected are documented in https://gerrit.libreoffice.org/c/core/+/145915.
Created attachment 185270 [details]
tdf153042_7b.docx: paragraph settings viewed, then OK'ed
(In reply to Justin L from comment #15)
Wow - this is amazing. At the moment we import these two documents almost perfectly (according to MS Word 2019 anyway).
The UI paragraph and outline settings between 7 and 7b look identical,
and yet one imports with a tabstop and 7b doesn't.
I created 7b in MS Word 2019. I noticed that the layout changed if I viewed the paragraph settings and then hit OK. (That is what was saved as 7b.)
By going to the outline numbering settings and hitting OK, you can get back to 7(a)'s layout.
Justin Luth committed a patch related to this issue.
It has been pushed to "master":
tdf#153042 doc/x/rtf: pre-emptive unit tests for numbering tabstop
It will be available in 7.6.0.
The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
At this point, this bug report is referring to DOCX and RTF only - since the DOC aspect has been reverted.
So this is a 7.4.1 regression based on Vasily's 7.5 master commit a7d9837a8aa6d1233f4c21e4db5d32428a3ffc58.
In general, his patch is correct, but there are many obscure combinations of unspecified values and direct formatting that affect this.
At this point the export is dropping the tab anyway, so perhaps it would be sane to revert Vasily's work altogether - since it only lasts for a single import anyway.
In any case, I think I am done here.