Description: Manual Column Break in DOCX documents created with Microsoft Word 2010 is lost when the document is exported in LibreOffice Writer. Steps to Reproduce: 1. Create a new document in Microsoft Word. 2. On the Page Layout tab, in the Page Setup group, click Columns. 3. Select Two. 4. Type “=lorem(2)” and press Enter. 5. On the Page Layout tab, in the Page Setup group, click Breaks. 6. Choose Column. 7. Type “=lorem(2)” and press Enter. 8. On Insert tab, in the Illustrations group, click Shapes. 9. Select a shape. 10. Click in the text and create a shape with mouse pulling. 11. Save the file as DOCX. 12. Open the same file in LibreOffice Writer. 13. Select File and Save As. 14. Name the file. 15. Reload the file. 16. Compare the original file opened in Word and the exported file opened in Writer. Actual Results: Manual Column Break is lost. Expected Results: Column Breaks should have the same number as the original file opened in Microsoft Word 2010. Reproducible: Always User Profile Reset: No Additional Info:
Created attachment 146944 [details] Screenshot of the original and exported document side by side in Word and Writer.
Created attachment 146945 [details] Example file from Word
Created attachment 146946 [details] The original file saved by Writer
Confirm with Version: 6.2.0.0.beta1+ Build ID: 268364e35100b559f42d8c02b930c5cca1c84be7 CPU threads: 4; OS: Linux 4.15; UI render: default; VCL: gtk3;
This seems to have begun at the below commit. Adding Cc: to Miklos Vajna ; Could you possibly take a look at this one? Thanks commit aeea51a8818a5deb7b95ba0b758463ba1703e9c4 Author: Matthew Francis <mjay.francis@gmail.com> Date: Sat Mar 14 22:48:09 2015 +0800 source-hash-c4a5f8c1afd42acb52d0ae9b4d6f42f3e87364d5 commit c4a5f8c1afd42acb52d0ae9b4d6f42f3e87364d5 Author: Miklos Vajna <vmiklos@collabora.co.uk> AuthorDate: Tue Jul 8 14:41:10 2014 +0200 Commit: Miklos Vajna <vmiklos@collabora.co.uk> CommitDate: Tue Jul 8 15:35:47 2014 +0200 MSWordExportBase: fix export of header/footer in case of multiple columns Regression from 263938c4a8789d881f8e736d317b6bcc09c3bce5 (fdo#73596 [DOCX] Multiple Columns in Index, 2014-02-13), header / footer was lost in multi-column section. This fixes both DOC and DOCX export.
Created attachment 147740 [details] columns2.docx: more paragraphs added so balanced and column break don't match (In reply to raal from comment #5) I get the same 4.4 bibisect result as you, but that specific commit just exposes several other things that happened in that time period from Feb till Jul. Although Word2003 still sees the manual break, the key problem to look for in bibisect4.4max is the loss of the column break, which actually happened from export commit 80fd9fb7209cfd5c0622ee99d59e42e6db32f021 Author: Umesh Kadam @synerzip.com CommitDate: Thu May 22 02:31:53 2014 -0500 fdo#78333 : SdtContent and a Shape overlapping causes corruption The following commit should be ENTIRELY unrelated - even though reverting it visually "fixes" the original columns.docx. commit d185204737031955c56a24356ed003d342548434 Author: Miklos Vajna <vmiklos@collabora.co.uk> Date: Thu Jul 17 14:59:19 2014 +0200 DOCX import: set DontBalanceTextColumns=true for the last section ... ... if it has multiple columns. See wwSectionManager::InsertSegments() for the related binary import code which already did this. There are two problems here. One is that the column break is being absorbed into the shape on import. Removing the shape also "fixes" the problem. The other problem is that the shape should be imported BEFORE the column break - probably requiring another fake/split paragraph.
So this works, but not sure where the best place to put this is. Anyway, although probably not an easy hack, this gives someone a good starting point. I think I'll leave this edge case open for a novice who wants to get into trouble... void DomainMapper::lcl_endParagraphGroup() + m_pImpl->SetIsFirstRun(false);
Also reproduced with file from sw/qa/extras/ooxmlexport/data/n652364.docx
sw/qa/extras/ooxmlexport/data/alphabeticalIndex_MultipleColumns.docx is also affected by this issue
Created attachment 163804 [details] alphabeticalIndex_MultipleColumns_minimized.docx (In reply to Xisco Faulí from comment #9) > sw/qa/extras/ooxmlexport/data/alphabeticalIndex_MultipleColumns.docx Yes, this one could be a similar since it has a field in the column section. I don't see a column break though. There are just continuous section breaks before and after the column section. This one bibisects to bibisect commit aeea51a8818a5deb7b95ba0b758463ba1703e9c4 source-hash-c4a5f8c1afd42acb52d0ae9b4d6f42f3e87364d5 Author: Miklos Vajna on Tue Jul 8 15:35:47 2014 +0200 MSWordExportBase: fix export of header/footer in case of multiple columns
Created attachment 163806 [details] alphabeticalIndex_MultipleColumns_minimal.pdf: how it looks in MSO 2016 (and 2003)
Created attachment 163807 [details] n652364_A6.docx: modified in Word 2003. A strange "End of Section"... (In reply to Xisco Faulí from comment #8) > Also reproduced with file from sw/qa/extras/ooxmlexport/data/n652364.docx This one was also affected by MSWordExportBase: fix export of header/footer in case of multiple columns but I don't think it is related to this bug report at all. P.S. Word 2016 doesn't like this old compatibility-format either and acts very differently when it saves it in native mode - much like LO currently handles it.
(In reply to Justin L from comment #12) > P.S. Word 2016 doesn't like this old compatibility-format either and acts > very differently when it saves it in native mode. That's bug 135343.
(In reply to Justin L from comment #7) > So this works, but not sure where the best place to put this is. > void DomainMapper::lcl_endParagraphGroup() > + m_pImpl->SetIsFirstRun(false); This doesn't seem to work anymore. Anyway, it looks like a false clue.
Note that this is also a FILEOPEN bug. MS Word 2016 and 2003 both open up the round-tripped document just fine. In fact, it seems like that really is the ultimate problem here. It doesn't look illegal to me. In the round-trip it just moves to the position before the image anchor instead of after the image on the same paragraph. There is only one unit test that has a page/column break deferred onto it. (tdf81345.docx - ooxmlexport4)
Created attachment 164012 [details] tdf121659_patch.diff: good enough for this specific example kind of fix This patch works for this particular example. But it doesn't work with tables, and might not work if a shape has an empty paragraph.
After a bit of extra investigation into the issue it looks like this is really a file open issue. In the original example the problem is with the image being anchored before the column break in the same paragraph as the very first run, then comes the column break and then some text. However adding a letter in front of the anchoring point in Word so that the image is not placed at the beginning of its paragraph as first run fixes all our woes that start with opening the file: the shape appears in the correct location (this is not detailed in the original report) and saving the file does not remove the column break. So the idea we consider is to create an extra empty paragraph in this situation before the break, so that the shape can be anchored to it. This should not hurt the visual appearance even after a roundtrip.
Attila Szűcs committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/f05c57b96293c80825be66162ca7bf3e4dbc8ea2 tdf#121659 DOCX import: fix lost column break at shapes It will be available in 7.2.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Verified in: Version: 7.2.0.0.alpha1+ (x64) / LibreOffice Community Build ID: aa9cb8e14749e7fb7a83b55a2bb095501f731a18 CPU threads: 4; OS: Windows 10.0 Build 17134; UI render: default; VCL: win Locale: hu-HU (hu_HU); UI: hu-HU Calc: threaded