Bug 95576 - Mishandling Heading (1-10) styles imported from MS Word .DOC -- result in negative indents and a moving baseline for multiline headings
Summary: Mishandling Heading (1-10) styles imported from MS Word .DOC -- result in neg...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: Other All
: medium normal
Assignee: Luke Deller
URL:
Whiteboard: target:5.2.0
Keywords: filter:doc
: 65865 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-11-04 17:33 UTC by Luke
Modified: 2018-08-14 16:14 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
sample with multiline headers from Word 2007 in MS binary (28.00 KB, application/msword)
2016-05-05 14:16 UTC, V Stuart Foote
Details
same header test .doc save-as exported to .odt from Word 2007 (4.91 KB, application/vnd.oasis.opendocument.text)
2016-05-05 14:22 UTC, V Stuart Foote
Details
same header test .doc save-as exported to .odt from Word 2007 (4.91 KB, application/vnd.oasis.opendocument.text)
2016-05-05 14:26 UTC, V Stuart Foote
Details
clips of the test document (368.96 KB, application/vnd.oasis.opendocument.graphics)
2016-05-05 14:53 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke 2015-11-04 17:33:56 UTC
This is a follow-up to Bug 93970. OpenOffice has a history of adding a negative indent to Heading styles. This broke .docx imports and was fixed by:
http://cgit.freedesktop.org/libreoffice/core/commit/?id=b95d203bc17c83ec0fe5139f519d53ed1d842d3a

The style was fixed for new ODT documents in Bug 93970, but the .DOC importer still uses the strange, old style resulting incorrectly formatted files. View attachment 118988 [details] to see how this looks.

Steps to reproduce:
1. In Word create a multi-line document with Style = Heading 1-10.
2. Save as .doc
3. Open in LibreOffice. 

See attachment 118938 [details] for an example of this negative indent. 

Note to developers: Rather than apply a fix to new ODT files, imported DOCX, and now imported DOC files, shouldn't we fix the styles themselves and then add a fix for any legacy ODT files that rely on this strange behavior?
Comment 1 A (Andy) 2015-11-06 20:48:16 UTC
Reproducible with LO 5.0.3.2, Win 8.1

Word and LO behave differently
Comment 2 V Stuart Foote 2016-05-05 14:16:28 UTC
Created attachment 124862 [details]
sample with multiline headers from Word 2007 in MS binary

Verified that errant heading indents are gone in LibreOffice 5.1.2 for a new Writer .odt document.

However, the attached sample MS Word binary .doc document on filter import to Writer 5.1.2 is mishandled with the odd negative 1st line indent and increasing base line as in refs [1][2]

Interesting if export from Word 2007 save-as ODF .odt document--that is opened in Writer 5.1.2 without heading indents of the .doc format.

Also attaching a screen clip.

=-refs-=
[1] https://bugs.documentfoundation.org/show_bug.cgi?id=93970#c12
[2] https://bugs.documentfoundation.org/show_bug.cgi?id=93970#c27
Comment 3 V Stuart Foote 2016-05-05 14:22:39 UTC
Created attachment 124863 [details]
same header test .doc save-as exported to .odt from Word 2007
Comment 4 V Stuart Foote 2016-05-05 14:26:08 UTC
Created attachment 124864 [details]
same header test .doc save-as exported to .odt from Word 2007
Comment 5 V Stuart Foote 2016-05-05 14:53:08 UTC
Created attachment 124865 [details]
clips of the test document

An assemblage of clips showing the mishandling in the ww8 import filter for handling multi-line headings.

1. is the original document in Word 2007
2. is the original document imported into Writer 5.1.2.2
3. is a save-as from Word 2007 to ODF, openend in Writer 5.1.2.2
4. is a new document in Writer 5.1.2.2 with text-only paste from the Word document, with LO Headings applied.

Still some work to be done with the ww8 import filter. However, round trip Writer -> .DOC -> Word 2007 -> new .DOC -> Writer retains heading styles, but clears the odd shifting indents.
Comment 6 Luke 2016-05-05 19:24:23 UTC
Anyone working on this should review:
The native odt fix in Bug 93970
http://cgit.freedesktop.org/libreoffice/core/commit/?id=05fd8cb848ecba425124d61cd76e2f9418d5378c


The docx fix in Bug 53175
https://cgit.freedesktop.org/libreoffice/core/commit/?id=b95d203bc17c83ec0fe5139f519d53ed1d842d3a

I have a feeling these fixes are hacks and fixing the issue on the wrong level or every format needs this fix.
Comment 7 Cor Nouws 2016-05-06 13:48:41 UTC
when working on this, it might be good to keep this one in mind too
  bug 98639 "  Spacing to contents (distance to paragraph border) is broken after saving as docx "

And (I've written this before somewhere): from what I heard in 2007/8 at a OOoCon presentation, in the ODF specs, the indent (margins) of the list styles prevail over those from the paragraphs.
That has been the case in Writer for a long time. No longer in recent versions.
Might be that that change is intentional, but I missed it.
(Hmm, me prolly should make a good test case and separate issue ;\ )
Comment 8 Luke Deller 2016-05-08 13:00:10 UTC
Trying to trace what is going on here:

- The document outline style created for a new document has the indents set appropriately for a numbered list.  As I commented against bug 93970, it looks like outline numbering used to be enabled by default but was disabled in 1996.  The outline indents probably should have been adjusted accordingly at that time but they were not.

- The native odt fix in bug 93970 does not touch the outline style, rather it changes the default paragraph styles named "Heading 1", "Heading 2" etc.  It explicitly sets an indent of zero on these paragraph styles.

- The docx fix in bug 53175 sidesteps all this by disabling creation of all the default paragraph styles when loading a docx.

I think we should adjust the default document outline style to have zero indent, and then revert the change for bug 93970.  This fixes the doc issue too, I guess because a default outline indent of zero matches Word's default.  I would like to do some more testing on this before submitting a patch.

One downside with this approach would be if somebody really wants a numbered outline, then heaps of clicking is involved to configure the outline style back to how it was before.  I think it would be a great UI improvement if the Tools -> Outline Numbering dialog allowed the user to select from the available list styles to initialise the outline style, rather than having to set up the outline style from first principals.
Comment 9 Luke Deller 2016-05-08 13:13:26 UTC
(In reply to Cor Nouws from comment #7)
> And (I've written this before somewhere): from what I heard in 2007/8 at a
> OOoCon presentation, in the ODF specs, the indent (margins) of the list
> styles prevail over those from the paragraphs.
> That has been the case in Writer for a long time. No longer in recent
> versions.
> Might be that that change is intentional, but I missed it.
> (Hmm, me prolly should make a good test case and separate issue ;\ )

Yes that sounds very interesting, probably worth following up.  I started looking at the ODT spec but could not find this interaction spelled out clearly.
Comment 10 Luke Deller 2016-05-09 11:49:04 UTC
(In reply to Cor Nouws from comment #7)
...
> And (I've written this before somewhere): from what I heard in 2007/8 at a
> OOoCon presentation, in the ODF specs, the indent (margins) of the list
> styles prevail over those from the paragraphs.
> That has been the case in Writer for a long time. No longer in recent
> versions.

Actually when I test this now in LO 4.0.1 and in master(5.2alpha), I see the same behaviour in both of these versions: the indent set on the list style overrides the indent set on the paragraph style.  However either of these will override the indent set on the document's outline style (Tools menu -> Outline Numbering).
Comment 12 Luke 2016-05-11 02:50:05 UTC
Luke Deller, 
Brilliant job tracking down and fixing this 20 year old regression. I have verified that this fix works on:

• Old .odt files with legacy behavior open/roundtrip correctly
• New .odt files do not indent
• .doc files now correctly imported and round without indent
• .docx files continue to import correctly

Marking Bug 65865 as a dupe of this, as that is centering correctly now too. Beautiful work! Thank you!
Comment 13 Luke 2016-05-11 02:50:54 UTC
*** Bug 65865 has been marked as a duplicate of this bug. ***