Bug 117501 - FILEOPEN: DOCX: Incorrect field character style (see comment 5)
Summary: FILEOPEN: DOCX: Incorrect field character style (see comment 5)
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected) release
Hardware: All All
: medium normal
Assignee: Not Assigned
Keywords: bibisected, bisected, filter:docx
Depends on:
Blocks: DOCX-Fields
  Show dependency treegraph
Reported: 2018-05-08 14:10 UTC by Xisco Faulí
Modified: 2023-03-19 03:25 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:

comparison MSO 2010 and LibreOffice 6.1 (28.89 KB, image/png)
2018-05-08 14:10 UTC, Xisco Faulí
The number '2' introduced by the same commit (9.78 KB, image/png)
2018-05-09 08:34 UTC, Xisco Faulí
First example with formatting removed (14.25 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-05-12 14:18 UTC, Luke Deller
tdf117501-fmtcleared_42_72.pdf: same in LO 4.2 and 7.2. No regression. (27.28 KB, application/pdf)
2021-03-18 12:41 UTC, Justin L

Note You need to log in before you can comment on or make changes to this bug.
Description Xisco Faulí 2018-05-08 14:10:04 UTC
Created attachment 141972 [details]
comparison MSO 2010 and LibreOffice 6.1

Steps to reproduce:
1. Open attachment 90710 [details] from bug 66401

-> See the field characters. They're much smaller than before. See the attached image

Reproduced in

Build ID: 1e2afc9bd3062cfba6b65b45c17a08f298014239
CPU threads: 4; OS: Linux 4.13; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group

[Bug found by office-interoperability-tools]
Comment 1 Xisco Faulí 2018-05-08 14:11:18 UTC
Regression introduced by:

author	Luke Deller <luke@deller.id.au>	2018-03-05 00:14:28 +1100
committer	Miklos Vajna <vmiklos@collabora.co.uk>	2018-03-05 10:26:47 +0100
commit 18cbb8fe699131a234355e1d00fa917fede6ac46 (patch)
tree 0642ae0059b821ed52228bde8bb526c15e2ec285
parent 60ac7418747530a006894a7941c67c5006d6158c (diff)
tdf#107035 Fix field character style DOCX import
Reinstate a call to DontExpandFormat which was removed from
appendTextContent in commit 232ad2f2588beff50cb5c1f3b689c581ba317583

This ensures that direct character formatting which ended immediately
before the insertion point will not be expanded to cover the inserted

Bisected with: bibisect/bibisect-linux64-6.1

Adding Cc: to Luke Deller
Comment 2 Xisco Faulí 2018-05-09 08:32:04 UTC
The same commit makes attachment 114395 [details] from bug 50774 to displayed a random '2' on the document
Comment 3 Xisco Faulí 2018-05-09 08:34:44 UTC
Created attachment 141996 [details]
The number '2' introduced by the same commit
Comment 4 Luke Deller 2018-05-12 14:12:37 UTC
Thanks you for finding this with your office-interoperability-tools work Xisco!

I think the underlying issue in attachment 90710 [details] existed prior to commit 18cbb8fe699131a234355e1d00fa917fede6ac46.

This is about an old equation editing feature in Word described here:

This example document uses such an "equation" to draw a superscript "x " superimposed upon a subscript " a".

While LibreOffice does not in general handle this old equation syntax, it does have a special case which can handle this example.  It imports it as a "Combine characters" field, a field intended for use with Asian languaes where several character are arranged in a grid occupying the space of a single character.  This handling existed in the ww8(doc) filter inherited from StarOffice at the beginning of our git history, and it was reused in the docx import for bug 66400 in commit 5342cd7533a51fd488de85565674ee01649ddcbc

The problem is that a "Combine characters" field containing 4 characters will show each character at half of its normal height/width (quarter of the area).  However the Word equation does not reduce the size of the letters at all.

This problem was hidden in the example here, because there was direct formatting placed upon the text prior to the field to double the font size.  This direct formatting was *not* applied to the field itself in Word, however due to bug tdf#107035 this direct formatting was wrongly applied to the field in LibreOffice.  This resulted in the field's font size being doubled, which exactly compensated for the problem I described in the previous paragraph.

Now that tdf#107035 is fixed, the problem in the equation import is no longer hidden.
Comment 5 Luke Deller 2018-05-12 14:18:36 UTC
Created attachment 142054 [details]
First example with formatting removed

Here is the first example edited in Word 2016 to remove the direct formatting.  This example demonstrates the underlying problem which is independent of commit 18cbb8fe699131a234355e1d00fa917fede6ac46: the letters in the fields are smaller in LibreOffice than in Word, with or without that commit.
Comment 6 QA Administrators 2019-05-14 03:00:43 UTC Comment hidden (obsolete)
Comment 7 Xisco Faulí 2019-05-14 07:07:29 UTC
Still reproducible in

Build ID: 630db80d17616d635cf2e5f1d5a0852428b794a3
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3; 
Locale: ca-ES (ca_ES.UTF-8); UI-Language: en-US
Calc: threaded
Comment 8 Justin L 2021-03-18 09:58:13 UTC
Removing 6.1 regression as per comment 5.

Support for this was introduced in 4.2 via
    commit 5342cd7533a51fd488de85565674ee01649ddcbc
    Author:     Caolán McNamara on Wed Sep 25 22:40:23 2013 +0200
Resolves: fdo#66400 import combined characters from docx
move .doc combined character parser stuff from sw to filter for reuse in .docx
and fix bad length problem when nSavPtr == -1 after String->OUString conversion.
    Thanks for the pasta CloudOn.
Comment 9 Justin L 2021-03-18 12:41:53 UTC
Created attachment 170552 [details]
tdf117501-fmtcleared_42_72.pdf: same in LO 4.2 and 7.2. No regression.
Comment 10 QA Administrators 2023-03-19 03:25:35 UTC Comment hidden (spam)