Bug 120546 - FILESAVE LO round-tripped DOCX List created with Word gets larger bullets in Writer
Summary: FILESAVE LO round-tripped DOCX List created with Word gets larger bullets in ...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.4.0.3 release
Hardware: All All
: medium minor
Assignee: Jan-Marek Glogowski
URL:
Whiteboard: target:6.5.0 target:6.4.0.1
Keywords: bibisectRequest, filter:docx
: 122032 (view as bug list)
Depends on:
Blocks: DOCX-Bullet-Number-Outline-Lists
  Show dependency treegraph
 
Reported: 2018-10-12 11:30 UTC by NISZ LibreOffice Team
Modified: 2020-03-30 10:01 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
A screenshot showcasing the issue. (445.06 KB, image/png)
2018-10-12 11:30 UTC, NISZ LibreOffice Team
Details
A minimal version of the document. (19.98 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-10-12 11:45 UTC, NISZ LibreOffice Team
Details
After the export (18.07 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-10-12 11:45 UTC, NISZ LibreOffice Team
Details
Comparison LibreOffice 6.2 Master and MSO 2010 (125.84 KB, image/png)
2018-10-15 08:44 UTC, Xisco Faulí
Details
Document compared MSO LO (140.66 KB, image/png)
2019-09-09 14:43 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description NISZ LibreOffice Team 2018-10-12 11:30:08 UTC
Description:
List in DOCX documents created with Microsoft Word 2010 gets larger bullets when the document is opened in LibreOffice Writer 

Steps to Reproduce:
1. Create a new document in Microsoft Word.
2. Type “=lorem(3)” end press Enter
3. Select All
4. Choose Bullets
5. Select the first paragraph, and change the font size to 16
6. Save the file as DOCX
7. Open the same file in LibreOffice Writer and compare the two versions.

Actual Results:
The bullets become larger on the 2. and 3. paragraph color opened in Writer.

Expected Results:
The bullets should have the same size like opened in Microsoft Word


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 NISZ LibreOffice Team 2018-10-12 11:30:41 UTC
Created attachment 145637 [details]
A screenshot showcasing the issue.
Comment 2 Xisco Faulí 2018-10-12 11:40:15 UTC Comment hidden (obsolete)
Comment 3 NISZ LibreOffice Team 2018-10-12 11:45:06 UTC
Created attachment 145642 [details]
A minimal version of the document.
Comment 4 NISZ LibreOffice Team 2018-10-12 11:45:32 UTC
Created attachment 145643 [details]
After the export
Comment 5 Xisco Faulí 2018-10-15 08:44:16 UTC Comment hidden (obsolete)
Comment 6 Timur 2019-09-09 14:43:06 UTC
Created attachment 154057 [details]
Document compared MSO LO

I guess bug description is not correct, simple fileopen is fine, but roundtrip fileopen after save is not. Same LO saved DOCX opens fine in MSO.
I'll change and confirm. Test LO 6.4+.
Comment 7 Timur 2019-09-10 09:03:24 UTC
Funny that LO up to 4.3 didn't read first 2 bullet size fine, but saved them right, as can be seen in later LO. 
Since 4.4 LO reads right but saves wrong.
I set bibisectRequest.
Comment 8 Jan-Marek Glogowski 2019-10-30 10:51:50 UTC
Some comment for the bug reporter: please create smaller bug documents in the future and if you attach images, please do this for the attached document. First I thought I downloaded the wrong document.

Now for the bug: in the original Word document, the last list entry in the import just has a character style assigned for the paragraph (w:pPr), but not to the run (w:rPr). The saved Writer document has an empty character style assigned to the paragraph and the run, which Word imports (correctly) as no assigned character run style, but Writer doesn't.

Just to make this clear: there is no bullet size saved anywhere. The bullets follow the font size of the whole paragraph with some percentage. You can see that in Word or with a Writer ODT file. The bullets change their size, if you change the font size of the whole list entry. This exhibits as a second bug, that the bullets don't follow the font size change, when a DOCX is imported in Writer. Neither the original Word document nor the saved Writer export do.

The real bug is the import side, which differs from Word, but Writer also shouldn't export and assign empty character runs.
Comment 9 Jan-Marek Glogowski 2019-11-11 12:45:22 UTC
FYI: in LO bullets are just a special type of enumeration, therefore this is all about numbering styles / rules.

So my initial suspected run formatting is not the real problem. After a lot of debugging and analyzing multiple simpler DOCX documents and following their round trip, it turned out, that LO adds a w:cs="Symbol" attribute to the font tag of the abstract numbering rule to set this font also as a complex font. This triggers a new code path in LO to store the character format attributes in an automatic character format, instead of using the abstract numbering format itself, as the current structure has no way to store the complex font setting. You can see the list of directly stored properties in IgnoreForCharStyle.

So after the LO export adding the w:cs attribute to the numbering rule, LO now generates an automatic character style on import (see ListDef::CreateNumberingRules), which is shared for the whole level of this rule. This happens before the actual document is parsed, as abstract and non-abstract rules are stored in an additional XML file and later referred by the actual document.

Now the real parsing of the document happens and if the document then overrides the character settings of the numeric rule entry, DomainMapper_Impl::GetCurrentNumberingCharStyle won't return the individual style of the current numbering entry, but the shared character formatting of the numbering rule. While this works correctly for RTF, because it doesn't have any real styles for numbering rules and has some other mapping to represent RTF correctly inside LO, overriding the shared style for DOCX is "not a good idea". Any overridden style setting now gets applied to all the numbering entries using the global numbering rules character style, which in this case results in bigger bullets.

The correct solution is to ignore the shared number rules and just store this value in some overwriting value. Luckily this kind of value was introduced to fix bug 64222, the "ListAutoFormat", as a new paragraph property to overwrite character settings per paragraph. It's automatically created by filtering all the character properties in DomainMapper_Impl::finishParagraph, so for DOCX it's now actually sufficient to ignore the rules setting and just set the paragraphs character overrides, just like in DOCX.

But the patch to fix bug 64222 is just exactly that. It misses many additional changes so it can be really used to completely fix this problem without breaking other unit tests and be usable in the LO GUI:
* an "UNO get property" implementation, so you would at least be able to fix some now-failing unit tests and check the correct value
* correctly represent the overridden value in the GUI
* allow to store the parent "CharStyleName" as part of this overwriting property 

Currently these overrides are just applied by checkApplyParagraphMarkFormatToNumbering, which will override the values correctly in the "view", but this is not reflected in the UNO API returning the styles of the numbering entry and consequently it's not reflected in the GUI either. It can't be used to correctly represent a parent character style, as SwAutoFormat doesn't set the parent of the SfxItemSet, as SwCharFormat normally does to represent this relationship. Interestingly IStyleAccess::getAutomaticStyle to create an automatic style actually has an optional pParentName, but that is just used for optimization and stored in a map of the StylePoolImpl and appears to be used for some optimization. I didn't yet dig deeper into this problem, as I don't yet understand, why the auto format is handled this way in LO. So eventually implementing "ListAutoFormat" as a SwFormatAutoFormat is not correct anyway?

A 2nd result of this is the non-removal of the character properties from the paragraph context, which would be logical (again see the DomainMapper_Impl::finishParagraph changes in the fix for bug 64222), but currently breaks things left and right.

And then there is also this non-RTF handling, but I'm not sure this is a general problem.

Just to summarize: even after spending two days on this, it feels I still don't understand a lot. There is https://gerrit.libreoffice.org/#/c/81878/, which has some side effects with the GUI, but seems to be the right approach from my definitely incomplete POV.
Comment 10 Jan-Marek Glogowski 2019-12-05 15:57:09 UTC
Some additional information on top of my last comment: for DOCX import DocumentSettingId::APPLY_PARAGRAPH_MARK_FORMAT_TO_NUMBERING is set and therefore checkApplyParagraphMarkFormatToNumbering applies the mark settings.

What is missing is the reflection in the GUI, so that the bullet points overridden settings are actually reflected, if you position the cursor in front of a bullet. Same problem happens, if you query the formatting of the bullet point, which will just return the underlying NumRule formatting, not the overridden formatting.

The fix for bug 64222 is also incomplete, because it doesn't remove the filtered character settings in DomainMapper_Impl::finishParagraph, as otherwise unit tests fail. This is the result of different handling of character format priorities, which just happen to work currently.

The priority implemented in LO is
0. Run Properties for the Paragraph Mark / overridden NumRule format (DOCX document view only)
1. NumRule format
2. Paragraph format

Just the 1st and 2nd have an effect on the way the bullets or numbering formatted is reflected in the GUI. If you place the cursor in front of the NumRule and change a setting, like the size, it is applied to all bullets of the rule. At this point the size will no longer follow the setting of the paragraph size.

There is no code, which takes the DocumentSettingId::APPLY_PARAGRAPH_MARK_FORMAT_TO_NUMBERING setting in the GUI into account, and I couldn't find it.
Comment 11 Commit Notification 2019-12-06 12:03:10 UTC
Jan-Marek Glogowski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6ed12ab2d0742f86ce25defec3c776562dbfad9a

tdf#120546 fix DOCX overriding numrule format

It will be available in 6.5.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 12 Jan-Marek Glogowski 2019-12-06 12:05:25 UTC
This just fixes the re-import. Still currently it's impossible to edit the bullets overridden character format in LO. This will be handled by an additional bug.
Comment 13 Commit Notification 2019-12-06 13:50:58 UTC
Jan-Marek Glogowski committed a patch related to this issue.
It has been pushed to "libreoffice-6-4":

https://git.libreoffice.org/core/commit/4412689b1358b5cc932f905417f23e2a3cbb494b

tdf#120546 fix DOCX overriding numrule format

It will be available in 6.4.0.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 NISZ LibreOffice Team 2020-01-15 07:30:21 UTC
*** Bug 122032 has been marked as a duplicate of this bug. ***