Bug 127616 - FILEOPEN DOCX character style properties applied instead of direct formatting in empty paragraphs (see comment 6)
Summary: FILEOPEN DOCX character style properties applied instead of direct formatting...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.0.4 release
Hardware: All All
: medium normal
Assignee: László Németh
URL:
Whiteboard: target:7.0.0
Keywords: bibisected, bisected, filter:docx, regression
Depends on:
Blocks: DOCX-Bullet-Number-Outline-Lists
  Show dependency treegraph
 
Reported: 2019-09-18 08:23 UTC by NISZ LibreOffice Team
Modified: 2020-05-21 16:29 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of the original document side by side in Word and Writer (80.21 KB, image/png)
2019-09-18 08:24 UTC, NISZ LibreOffice Team
Details
Screenshot of the original document side by side in Writer 6.1 and 6.2 (124.85 KB, image/png)
2019-09-18 08:24 UTC, NISZ LibreOffice Team
Details
Sample file from Word (34.64 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-09-18 08:25 UTC, NISZ LibreOffice Team
Details
debugging_tdf127616.diff: easy to revert - but that doesn't shed light on what is wrong (3.85 KB, patch)
2019-09-25 14:05 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description NISZ LibreOffice Team 2019-09-18 08:23:52 UTC
Description:
The character style In the 3 empty paragraphs changes in the attached DOCX document from Arial 10 to Segoe UI 11. Also they have slightly larger height in 6.2 compared to 6.1 and Word 2013, which made the original user-made document a bit taller.

Steps to Reproduce:
    1. Open the attached sample DOCX file in MS Word
    2. Open the attached screenshot
    3. Compare the document opened in MS Word and LibreOffice 6.4.

Actual Results:
The empty paragraphs get the Segoe UI 11 font setting from the character style “Font Style11” and not the Arial 10 setting from paragraph style “Style1” - unlike in Word. This works fine for the non-empty paragraph though.

Expected Results:
Same font settings as in Word


Reproducible: Always


User Profile Reset: No



Additional Info:
LibreOffice details:

Version: 6.4.0.0.alpha0+ (x64)
Build ID: 41cd3e8e817c8c33a13608e62eeb06ce2c6977e4
CPU threads: 8; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@62-TDF, Branch:master, Time: 2019-09-01_22:04:10
Locale: hu-HU (hu_HU); UI-Language: en-US
Calc: threaded

Also in:
Verzió: 6.2.0.3
Build az.: 98c6a8a1c6c7b144ce3cc729e34964b47ce25d62
CPU szálak: 4; OS: Windows 6.3; Felületmegjelenítés: GL; VCL: win; 
Területi beállítások: hu-HU (hu_HU); UI nyelve: hu-HU
Calc: threaded

Does not happen in
Version: 6.1.4.2
Build ID: 9d0f32d1f0b509096fd65e0d4bec26ddd1938fd3
CPU threads: 4; OS: Windows 6.3; UI render: GL; 
Locale: hu-HU (hu_HU); Calc: CL

Bibisected using bibisect-win32-6.2 to:  
URL: https://cgit.freedesktop.org/libreoffice/core/commit/?id=49ddaad2f3ba4e17e1e41e94824fb94468d2b680
author: Justin Luth 
committer: 	Miklos Vajna
summary:tdf#117988 writerfilter: IgnoreTabsAndBlanksForLineCalculation
Comment 1 NISZ LibreOffice Team 2019-09-18 08:24:24 UTC
Created attachment 154251 [details]
Screenshot of the original document side by side in Word and Writer
Comment 2 NISZ LibreOffice Team 2019-09-18 08:24:44 UTC
Created attachment 154252 [details]
Screenshot of the original document side by side in Writer 6.1 and 6.2
Comment 3 NISZ LibreOffice Team 2019-09-18 08:25:05 UTC
Created attachment 154253 [details]
Sample file from Word
Comment 4 NISZ LibreOffice Team 2019-09-18 08:25:54 UTC
Adding CC to: Justin Luth
Comment 5 Dieter 2019-09-19 05:32:10 UTC
I confirm it with

Version: 6.4.0.0.alpha0+ (x64)
Build ID: f0c832acb53326ccc9a8c1a47401fbc9e1081feb
CPU threads: 4; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@62-TDF, Branch:master, Time: 2019-09-11_05:46:53
Locale: de-DE (de_DE); UI-Language: en-US
Calc: threaded
Comment 6 Justin L 2019-09-23 15:00:38 UTC
The bibisect seems to only be applied to this part of the report:
(In reply to NISZ LibreOffice Team from comment #0)
> Also they have slightly larger height in 6.2 compared to 6.1
My guess is that before it was ignoring the Segoe UI size 11 (since IgnoreTabsAndBlanksForLineCalculation was true) and instead used the default style of Arial size 10 - thus being slightly smaller.

I suspect that the real problem here is that although the Arial 10 font is directly applied to the paragraph, for some reason (see bisect below) that direct formatting has been lost.

The first paragraph is defined like this (simplified):
<w:p>
  <w:pPr>
    <w:pStyle w:val="Style1"/>
    <w:rPr>
      <w:rStyle w:val="FontStyle11"/>
      <w:rFonts w:ascii="Arial" w:hAnsi="Arial" w:cs="Arial"/>
      <w:b/>
      <w:sz w:val="20"/>
    </w:rPr>
  </w:pPr>
</w:p>

So direct character formatting says it should be Arial/bold/10pt, making both FontStyle11 and Style1 irrelevant for font/size attributes.

P.S. Contrary to OPs report, both FontStyle11 and Style1 define Segoe UI as the font.  The difference is pt11 for FontStyle11, and pt12 for Style1.

Prior to LO 4.2, these empty paragraphs were Arial 10, not Segoe UI 11. This changed with commit 986fa38eb23a397546061c3ce0df9077ba334a07
    Author:     Matús Kukan
    CommitDate: Fri Oct 25 17:14:30 2013 +0200    
        fdo#44736 - set and fetch multiple properties concurrently 2
    
        This fixes commit ee0bf5d58bc59052923c4ced928a989956e71456
Comment 7 Justin L 2019-09-25 14:05:36 UTC
Created attachment 154493 [details]
debugging_tdf127616.diff: easy to revert - but that doesn't shed light on what is wrong

I don't get it. The problem is not specifically the SetPropertyValues (plural) function itself (since I changed the one-at-a-time routine to use that one instead of SetPropertyValue), but when all these properties are quickly set together, then it acts differently although walking through GDB made it look all OK - I saw:

State = com::sun::star::beans::PropertyState::PropertyState_DIRECT_VALUE}, {Name = "CharFontName", Handle = 0, Value = uno::Any("string": "Arial"),

after all of the properties had been set.
Comment 8 Justin L 2019-09-25 17:09:58 UTC
Excluding the Character Style (RES_TXTATTR_CHARFMT) from the properties allows the font/size properties to be set.
Comment 9 Justin L 2019-09-26 08:47:48 UTC
exploratory patch: https://gerrit.libreoffice.org/79594 

tdf#127616 sw: timing issue? Import CharStyle early to allow direct formatting properties to stick.
    
It breaks one unit test, but interactively it shows the expected settings, so for some reason the unit test acts differently. Obviously there is something funny happening under the hood here...
Comment 10 Commit Notification 2020-05-18 15:59:00 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/fb001eab98934c5a4d0a8c6b9563f91337561b87

tdf#127616 DOCX import: fix char style of empty paragraph

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Xisco Faulí 2020-05-19 12:00:52 UTC
Verified in

Version: 7.0.0.0.alpha1+
Build ID: 8209c2569f5726f9ed29f75d30efdccb94f98fe5
CPU threads: 4; OS: Linux 4.19; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

@László Németh, thanks for fixing this issue! quite an old regression. don't think it needs to be backported.
Comment 12 László Németh 2020-05-21 16:29:47 UTC
(In reply to Xisco Faulí from comment #11)

> @László Németh, thanks for fixing this issue! quite an old regression. don't
> think it needs to be backported.

@Xisco: thanks for verifying! It think, it's safe to backport, because it doesn't affect other things, but it's not so important, because it's not so common using character style on an empty paragraph. (By the way, paragraphmarker.docx shows an interesting and useful feature of MSO: cursive character style Emphasis switch off cursive in the cursive paragraph style Quote. It would be mpre useful to support this in Writer, too.)