Bug 49785 - Saving as .docx removes style of last number in numbered list when last paragraph in document
Summary: Saving as .docx removes style of last number in numbered list when last parag...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Jan-Marek Glogowski
URL:
Whiteboard:
Keywords: filter:docx
Depends on:
Blocks: DOCX-Bullet-Number-Outline-Lists
  Show dependency treegraph
 
Reported: 2012-05-11 02:56 UTC by some.bananas1234
Modified: 2023-06-23 04:48 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
ODT document with a numbered list (9.05 KB, application/vnd.oasis.opendocument.text)
2012-05-11 02:56 UTC, some.bananas1234
Details
Same document saved as DOCX, style on the number in list is gone (4.19 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-05-11 02:56 UTC, some.bananas1234
Details
lastEmptyParagraph.odt: save as .docx looses the 96 pt paragraph size (9.52 KB, application/vnd.oasis.opendocument.text)
2018-01-13 13:49 UTC, Justin L
Details
2019-01-15 Numbered list example, ODT version (10.79 KB, application/vnd.oasis.opendocument.text)
2019-01-15 18:00 UTC, Ahiijny
Details
2019-01-15 Numbered list example, DOCX version (5.18 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-01-15 18:01 UTC, Ahiijny
Details
2019-01-15 Numbered list example, before and after (133.06 KB, image/png)
2019-01-15 18:25 UTC, Ahiijny
Details
Document nodes diff between odt and docx. (3.71 KB, patch)
2019-09-05 16:12 UTC, Jan-Marek Glogowski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description some.bananas1234 2012-05-11 02:56:03 UTC
Created attachment 61436 [details]
ODT document with a numbered list

When a document which contains a numbered list where the text has some styles applied (larger font, bold, etc) is saved as DOCX, the style disappears from the number on the list item.

Please see attached documents for an example. The number "1" in the DOCX document has the default style, not the style of the list item.
Comment 1 some.bananas1234 2012-05-11 02:56:44 UTC
Created attachment 61437 [details]
Same document saved as DOCX, style on the number in list is gone
Comment 2 s-joyemusequna 2012-05-11 06:02:51 UTC
Confirmed with LibO 3.4.5 and LOdev version 3.6.0alpha0+ (Build ID: 34513fe) under Windows Xp and Vista 64.
Comment 3 some.bananas1234 2012-05-14 07:05:23 UTC
The XML in word/document.xml for the paragraph in the example document is as follows;

    <w:p>
      <w:pPr>
        <w:pStyle w:val="style0"/>
        <w:numPr>
          <w:ilvl w:val="0"/>
          <w:numId w:val="1"/>
        </w:numPr>
      </w:pPr>
      <w:r>
        <w:rPr>
          <w:b/>
          <w:bCs/>
          <w:sz w:val="36"/>
          <w:szCs w:val="36"/>
        </w:rPr>
        <w:t>Foo</w:t>
      </w:r>
    </w:p>

If I put a copy of the w:rPr block inside the w:pPr block, the number gets the right style!

I'm afraid I don't know the LibO code though..
Comment 4 Owen Genat (retired) 2014-07-05 14:30:03 UTC
Under v4.3.0.2 Build ID: 14ed55896fdfcb93ff437b85c4f3e1923d2b1409 the problem persists and affects the identifiers of ordered lists created both via direct formatting (toolbar) and using paragraph / list styles.
Comment 5 Gordo 2015-04-25 15:08:19 UTC
Opened attachment 61436 [details] and placed cursor at the end of the paragraph and pressed Enter twice then saved as docx.  When the document is reopened, the number in the list has kept its formatting.  However, the empty paragraph below the numbered list has now lost its formatting.

1. New Text Document.
2. Change to Font Size 18 and Bold.
3. Turn on Numbering.
4. Type "test1" and Enter.
5. Type "test2" but don't Enter.
6. Save as docx.
7. Close and reopen document.
Result:
Number 2 in the list has lost its formatting.

1. New Text Document.
2. Change to Font Size 18 and Bold.
3. Type "test" and don't Enter.
4. Save as docx.
5. Close and reopen document.
Result:
"test" still has its formatting.

1. New Text Document.
2. Change to Font Size 18 and Bold.
3. Type "test" and Enter.
4. Save as docx.
5. Close and reopen document.
Result:
Emtpy paragraph below "test" has lost its formatting.

I think this bug is to do with the last paragraph in the document losing its formatting if it is empty and somehow this affects the last paragraph in a numbered list if it is also the last paragraph in the document.

Version: 4.4.2.2
Build ID: c4c7d32d0d49397cad38d62472b0bc8acff48dd6
Comment 6 QA Administrators 2016-09-20 10:14:29 UTC Comment hidden (obsolete, spam)
Comment 7 Justin L 2018-01-13 13:47:13 UTC
Confirmed still exists in 6.1alpha. Confirmed comment 5 - this is not related particularly to numbered lists - the last empty paragraph looses formatting.

This looks like an import problem, since MSWord 2003 reads the .docx just fine. Caused by ~DomainMapper_Impl calling RemoveLastParagraph. Calling "dispose" instead of "setString" works.  See bug 58327.
Comment 8 Justin L 2018-01-13 13:49:44 UTC
Created attachment 139078 [details]
lastEmptyParagraph.odt: save as .docx looses the 96 pt paragraph size
Comment 9 Justin L 2018-01-15 06:26:52 UTC
This ought to be super easy for someone who understands unotext well. Unfortunately, it completely baffles me.

I don't understand why just exchanging the string would remove all the PropertyValues in the first place.  Even so, I would think it should be trivial to get a copy of the PropertyValues, and then re-apply them afterwards (maybe with finishParagraph?). But even these trivial tasks were too much for me.  Very frustrating.
Comment 10 Justin L 2019-01-15 17:37:34 UTC
Tried again - still no luck.  For next time, make sure you don't go down the rabbit trail of looking for xTextAppend->getString() == SAL_NEWLINE_STRING because that doesn't solve the problem with OP's document.
Comment 11 Ahiijny 2019-01-15 17:57:11 UTC
Version: 6.1.4.2
Build ID: libreoffice-6.1.4.2-snap1
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: en-CA (en_CA.UTF-8); Calc: group threaded

When working with .docx files, I see formatting being removed from the numbers in numbered lists even when it's not the last paragraph. It seems to happen sometimes when typing.

Steps:
1. New document
2. Change the font (e.g. to Arial 20pt)
3. Start a numbered list using F12 or the "Toggle Numbered List" button from the Formatting toolbar
4. Type some rows:
    1. one
    2. two
    3. three
    4. four
    5. five
5. Save as .docx
6. Close and reopen file

Observations:
1. The "5." is now Liberation Serif 12pt, unlike the other list numbers.
2. If you click at the end of row 2 and press <Enter>, the "2." also reverts to Liberation Serif 12pt. Same with any of the other rows.
3. Paste Special > Unformatted text at the end of a row will also cause the list number to revert to Liberation Serif 12pt.

This <Enter> behaviour happens even if the numbered list was not the last paragraph in the document.

Also, if there is enough text in the rows to wrap multiple lines, then after reopening the .docx, even sometimes just typing some text is enough to trigger the formatting change behaviour. But this is very inconsistent and it doesn't happen every time.

This particular case happens every time for me, though:

Steps:
1. New document
2. Set font to Arial 10pt
3. Start new numbered list
4. Type "Please prepare the mortgage referencing Standard Charge Terms No. 200033 and with our additional provisions (attached)." and press <ENTER>.
5. Fill out the next several lines with some placeholder text, just to have more rows to play with.
6. Save as .docx
7. Close and reopen file
8. Replace "200033" with "${terms_number}"
9. Click elsewhere

Observations:
1. The "1." reverts to Liberation Serif 12pt.
2. Interestingly, unlike the formatting reverts caused by <Enter>, the above typing formatting change cannot be reverted with Undo.
Comment 12 Ahiijny 2019-01-15 18:00:19 UTC
Created attachment 148346 [details]
2019-01-15 Numbered list example, ODT version
Comment 13 Ahiijny 2019-01-15 18:01:25 UTC
Created attachment 148347 [details]
2019-01-15 Numbered list example, DOCX version
Comment 14 Ahiijny 2019-01-15 18:25:16 UTC
Created attachment 148349 [details]
2019-01-15 Numbered list example, before and after
Comment 15 Jan-Marek Glogowski 2019-09-04 09:02:26 UTC
This looks definitely like an import problem. Opening a / the test document in Word, works correctly. Creating a test document in Word and opening it in Writer has the same problem. From my POV the XML in either the Writer export or the Word export looks the same.

So I can also confirm the two problems:
1. On open, the last numbered item always has the default wrong formatting
2. In a multi-item document, adding a new item by pressing enter anywhere in an existing / correct item, changes the old numbering item to the broken default and interestingly creates the new item with the original / correct format.
Comment 16 Jan-Marek Glogowski 2019-09-05 16:12:36 UTC
Created attachment 153927 [details]
Document nodes diff between odt and docx.

(In reply to Justin L from comment #7)
> Confirmed still exists in 6.1alpha. Confirmed comment 5 - this is not
> related particularly to numbered lists - the last empty paragraph looses
> formatting.
> 
> This looks like an import problem, since MSWord 2003 reads the .docx just
> fine. Caused by ~DomainMapper_Impl calling RemoveLastParagraph. Calling
> "dispose" instead of "setString" works.  See bug 58327.

AFAI can currently tell, the RemoveLastParagraph is really not the problem. The formatting of the list item is.

You can see this, if you make an full round-trip by opening a broken docx and save it as an odt and reopen it. The resulting document is correctly handled by Writer again, just like the original. You can even add breaks to multiple items with different formatting in the docx, save that and the odt document will look as the docx was expected to look in the editor.

My current conclusion: some meta-information is missing somewhere between the imported docx document structure and the Writer layouting. Unfortunately this makes the fix even harder - at least for me.

One probably good thing: the diff of the layout.xml between the odt and docx is zero (minus the obviously changed memory pointer values), so the layout seems to be equal.

And for the attached nodes.xml diff (SW_DEBUG=1 soffice and Shift+F12, if you want to reproduce): I don't yet know what the meaning of using SwpHints instead of SwAttrSet is, but it's somehow partly interchangeable and seems to correctly survive the full round-trip.
Comment 17 Jan-Marek Glogowski 2019-09-05 17:31:27 UTC
There was a patch to fix bug 64222. This kind of mutated this bug, as the import is now correct, but interestingly the removal of the direct formatting of a numbered item now won't restore the original number format, but just for a DOCX document and just if it's not the last item. Saving that "broken", edited DOCX as an ODT and opening that document, fixes the number again. Saving as a 2nd DOCX doesn't help the number format.

This kind of proves my suspicion, that for the DOCX import some connection between the direct formatting and the number format is missing while displaying / editing the DOCX in Writer.
Comment 18 Jan-Marek Glogowski 2019-09-17 16:08:00 UTC
There was a 2nd patch for bug 64222 to fix a regression introduced by the first patch. This fixes this bug as far as I can tell. But there is still a bug for editing the the DOCX numbered item, where changing the items font (type, size, underline, italic, etc.) won't be reflected in the number, if the value was overridden in the ODT. For all non-overriden values, changing the item is also reflected in the number. I think this is the consequence of the different representation in the internal nodes, already visible in the attached diff.

But I consider this a completely different bug, as actually both described bugs in comment 1 and comment 11 (wrong import and wrong item breaks) are fixed now.