Bug 121659 - FILEOPEN DOCX Shape incorrectly placed when anchored before Manual Column Break
Summary: FILEOPEN DOCX Shape incorrectly placed when anchored before Manual Column Break
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.4 all versions
Hardware: All All
: medium normal
Assignee: Attila Szűcs
URL:
Whiteboard: target:7.2.0
Keywords: bibisected, bisected, filter:docx, regression
Depends on:
Blocks: Page-Layout-Columns Dev-import-export-pages
  Show dependency treegraph
 
Reported: 2018-11-23 09:53 UTC by NISZ LibreOffice Team
Modified: 2021-06-10 08:55 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Screenshot of the original and exported document side by side in Word and Writer. (213.15 KB, image/png)
2018-11-23 09:54 UTC, NISZ LibreOffice Team
Details
Example file from Word (23.00 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-11-23 09:54 UTC, NISZ LibreOffice Team
Details
The original file saved by Writer (17.82 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-11-23 09:55 UTC, NISZ LibreOffice Team
Details
columns2.docx: more paragraphs added so balanced and column break don't match (19.41 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2018-12-21 12:55 UTC, Justin L
Details
alphabeticalIndex_MultipleColumns_minimized.docx (10.99 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-07-31 12:11 UTC, Justin L
Details
alphabeticalIndex_MultipleColumns_minimal.pdf: how it looks in MSO 2016 (and 2003) (192.69 KB, application/pdf)
2020-07-31 12:19 UTC, Justin L
Details
n652364_A6.docx: modified in Word 2003. A strange "End of Section"... (10.41 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-07-31 12:25 UTC, Justin L
Details
tdf121659_patch.diff: good enough for this specific example kind of fix (1.63 KB, patch)
2020-08-06 18:53 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description NISZ LibreOffice Team 2018-11-23 09:53:56 UTC
Description:
Manual Column Break in DOCX documents created with Microsoft Word 2010 is lost when the document is exported in LibreOffice Writer.

Steps to Reproduce:
    1. Create a new document in Microsoft Word.
    2. On the Page Layout tab, in the Page Setup group, click Columns.
    3. Select Two.
    4. Type “=lorem(2)” and press Enter.
    5. On the Page Layout tab, in the Page Setup group, click Breaks.
    6. Choose Column.
    7. Type “=lorem(2)” and press Enter.
    8. On Insert tab, in the Illustrations group, click Shapes.
    9. Select a shape.
    10. Click in the text and create a shape with mouse pulling.
    11. Save the file as DOCX.
    12. Open the same file in LibreOffice Writer.
    13. Select File and Save As.
    14. Name the file.
    15. Reload the file.
    16. Compare the original file opened in Word and the exported file opened in Writer.

Actual Results:
Manual Column Break is lost.

Expected Results:
Column Breaks should have the same number as the original file opened in Microsoft Word 2010.


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 NISZ LibreOffice Team 2018-11-23 09:54:21 UTC
Created attachment 146944 [details]
Screenshot of the original and exported document side by side in Word and Writer.
Comment 2 NISZ LibreOffice Team 2018-11-23 09:54:42 UTC
Created attachment 146945 [details]
Example file from Word
Comment 3 NISZ LibreOffice Team 2018-11-23 09:55:02 UTC
Created attachment 146946 [details]
The original file saved by Writer
Comment 4 raal 2018-11-24 15:00:41 UTC
Confirm with Version: 6.2.0.0.beta1+
Build ID: 268364e35100b559f42d8c02b930c5cca1c84be7
CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3;
Comment 5 raal 2018-11-24 16:07:26 UTC
This seems to have begun at the below commit.
Adding Cc: to  Miklos Vajna ; Could you possibly take a look at this one?
Thanks

commit aeea51a8818a5deb7b95ba0b758463ba1703e9c4
Author: Matthew Francis <mjay.francis@gmail.com>
Date:   Sat Mar 14 22:48:09 2015 +0800

    source-hash-c4a5f8c1afd42acb52d0ae9b4d6f42f3e87364d5
    
    commit c4a5f8c1afd42acb52d0ae9b4d6f42f3e87364d5
    Author:     Miklos Vajna <vmiklos@collabora.co.uk>
    AuthorDate: Tue Jul 8 14:41:10 2014 +0200
    Commit:     Miklos Vajna <vmiklos@collabora.co.uk>
    CommitDate: Tue Jul 8 15:35:47 2014 +0200
    
        MSWordExportBase: fix export of header/footer in case of multiple columns
    
        Regression from 263938c4a8789d881f8e736d317b6bcc09c3bce5 (fdo#73596
        [DOCX] Multiple Columns in Index, 2014-02-13), header / footer was lost
        in multi-column section. This fixes both DOC and DOCX export.
Comment 6 Justin L 2018-12-21 12:55:55 UTC
Created attachment 147740 [details]
columns2.docx: more paragraphs added so balanced and column break don't match

(In reply to raal from comment #5)
I get the same 4.4 bibisect result as you, but that specific commit just exposes several other things that happened in that time period from Feb till Jul. Although Word2003 still sees the manual break, the key problem to look for in bibisect4.4max is the loss of the column break, which actually happened from export commit 80fd9fb7209cfd5c0622ee99d59e42e6db32f021
    Author:     Umesh Kadam  @synerzip.com
    CommitDate: Thu May 22 02:31:53 2014 -0500
        fdo#78333 : SdtContent and a Shape overlapping causes corruption


The following commit should be ENTIRELY unrelated - even though reverting it visually "fixes" the original columns.docx.
commit d185204737031955c56a24356ed003d342548434
Author: Miklos Vajna <vmiklos@collabora.co.uk>
Date:   Thu Jul 17 14:59:19 2014 +0200
    DOCX import: set DontBalanceTextColumns=true for the last section ...
    ... if it has multiple columns.
    
    See wwSectionManager::InsertSegments() for the related binary import
    code which already did this.

There are two problems here. One is that the column break is being absorbed into the shape on import. Removing the shape also "fixes" the problem.
The other problem is that the shape should be imported BEFORE the column break - probably requiring another fake/split paragraph.
Comment 7 Justin L 2018-12-22 07:31:59 UTC Comment hidden (obsolete)
Comment 8 Xisco Faulí 2020-05-02 16:46:25 UTC
Also reproduced with file from sw/qa/extras/ooxmlexport/data/n652364.docx
Comment 9 Xisco Faulí 2020-05-04 08:19:42 UTC
sw/qa/extras/ooxmlexport/data/alphabeticalIndex_MultipleColumns.docx is also affected by this issue
Comment 10 Justin L 2020-07-31 12:11:18 UTC
Created attachment 163804 [details]
alphabeticalIndex_MultipleColumns_minimized.docx

(In reply to Xisco Faulí from comment #9)
> sw/qa/extras/ooxmlexport/data/alphabeticalIndex_MultipleColumns.docx
Yes, this one could be a similar since it has a field in the column section. I don't see a column break though. There are just continuous section breaks before and after the column section.

This one bibisects to
bibisect commit aeea51a8818a5deb7b95ba0b758463ba1703e9c4
    source-hash-c4a5f8c1afd42acb52d0ae9b4d6f42f3e87364d5
    Author:     Miklos Vajna on Tue Jul 8 15:35:47 2014 +0200
        MSWordExportBase: fix export of header/footer in case of multiple columns
Comment 11 Justin L 2020-07-31 12:19:33 UTC
Created attachment 163806 [details]
alphabeticalIndex_MultipleColumns_minimal.pdf: how it looks in MSO 2016 (and 2003)
Comment 12 Justin L 2020-07-31 12:25:18 UTC
Created attachment 163807 [details]
n652364_A6.docx: modified in Word 2003. A strange "End of Section"...

(In reply to Xisco Faulí from comment #8)
> Also reproduced with file from sw/qa/extras/ooxmlexport/data/n652364.docx

This one was also affected by
MSWordExportBase: fix export of header/footer in case of multiple columns
but I don't think it is related to this bug report at all.

P.S. Word 2016 doesn't like this old compatibility-format either and acts very differently when it saves it in native mode - much like LO currently handles it.
Comment 13 Justin L 2020-08-01 11:53:52 UTC
(In reply to Justin L from comment #12)
> P.S. Word 2016 doesn't like this old compatibility-format either and acts
> very differently when it saves it in native mode.
That's bug 135343.
Comment 14 Justin L 2020-08-01 12:15:10 UTC
(In reply to Justin L from comment #7)
> So this works, but not sure where the best place to put this is.
> void DomainMapper::lcl_endParagraphGroup()
> +    m_pImpl->SetIsFirstRun(false);
This doesn't seem to work anymore. Anyway, it looks like a false clue.
Comment 15 Justin L 2020-08-06 14:34:56 UTC
Note that this is also a FILEOPEN bug. MS Word 2016 and 2003 both open up the round-tripped document just fine. In fact, it seems like that really is the ultimate problem here.

It doesn't look illegal to me. In the round-trip it just moves to the position before the image anchor instead of after the image on the same paragraph.

There is only one unit test that has a page/column break deferred onto it. (tdf81345.docx - ooxmlexport4)
Comment 16 Justin L 2020-08-06 18:53:35 UTC
Created attachment 164012 [details]
tdf121659_patch.diff: good enough for this specific example kind of fix

This patch works for this particular example.
But it doesn't work with tables, and might not work if a shape has an empty paragraph.
Comment 17 NISZ LibreOffice Team 2021-04-28 15:19:25 UTC
After a bit of extra investigation into the issue it looks like this is really a file open issue.

In the original example the problem is with the image being anchored before the column break in the same paragraph as the very first run, then comes the column break and then some text.

However adding a letter in front of the anchoring point in Word so that the image is not placed at the beginning of its paragraph as first run fixes all our woes that start with opening the file: 
the shape appears in the correct location (this is not detailed in the original report) 
and saving the file does not remove the column break.

So the idea we consider is to create an extra empty paragraph in this situation before the break, so that the shape can be anchored to it. This should not hurt the visual appearance even after a roundtrip.
Comment 18 Commit Notification 2021-05-06 12:07:33 UTC
Attila Szűcs committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f05c57b96293c80825be66162ca7bf3e4dbc8ea2

tdf#121659 DOCX import: fix lost column break at shapes

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 NISZ LibreOffice Team 2021-06-10 08:55:22 UTC
Verified in: 

Version: 7.2.0.0.alpha1+ (x64) / LibreOffice Community
Build ID: aa9cb8e14749e7fb7a83b55a2bb095501f731a18
CPU threads: 4; OS: Windows 10.0 Build 17134; UI render: default; VCL: win
Locale: hu-HU (hu_HU); UI: hu-HU
Calc: threaded