Bug 105134 - Google Doc export to openxml Word docx displays incorrect spacing between image and text box object in Writer and inserts blank pages
Summary: Google Doc export to openxml Word docx displays incorrect spacing between ima...
Status: RESOLVED DUPLICATE of bug 89297
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.3.0.3 release
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.1.0 target:7.3.0 target:7.2....
Keywords: filter:docx
Depends on:
Blocks: DOCX-Section
  Show dependency treegraph
 
Reported: 2017-01-05 23:16 UTC by Les
Modified: 2021-06-22 15:02 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Google Docs document (1.21 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-01-05 23:17 UTC, Les
Details
Screenshot of doc in Google Doc to show images (128.24 KB, image/jpeg)
2017-01-06 20:23 UTC, Les
Details
LibreOffice screenshot dealing with images in doc. (150.73 KB, image/jpeg)
2017-01-06 20:24 UTC, Les
Details
Screenshot in Word for OSX 15.29.1 (56.59 KB, image/png)
2017-01-09 16:24 UTC, Alex Thurgood
Details
Minimized example file (43.01 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-12-12 11:27 UTC, NISZ LibreOffice Team
Details
Screenshot of the minimized document in Word and Writer (116.96 KB, image/png)
2020-12-12 11:28 UTC, NISZ LibreOffice Team
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Les 2017-01-05 23:16:24 UTC
Description:
Document is poorly formatted when opened in LibreOffice. Extra blank pages appear and what appear to be text images don't appear in-line.

Steps to Reproduce:
1.Open document
2.
3.

Actual Results:  
You can tell that the document is not properly formatted

Expected Results:
Compare to Google Docs when the document is opened there.


Reproducible: Always

User Profile Reset: No

Additional Info:
How do I get the sample file attached to the bug report?


User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.95 Safari/537.36
Comment 1 Les 2017-01-05 23:17:25 UTC
Created attachment 130189 [details]
Google Docs document
Comment 2 Alex Thurgood 2017-01-06 08:44:11 UTC
@Les : the test document you provided is not a Google Doc native document, but a Microsoft Word openxml formatted document (docx).

You need to be more specific with your report.

This could be one of :

- a docx import problem, i.e. a problem with the docx filter that LibreOffice uses - there are already many bug reports about incomplete docx support, in which case, your report would probably be a duplicate ;

- a GoogleDocs document import problem, however, in that case you would need to provide a native GoogleDocs document - the file you have provided is not such a file - to my knowledge, you can not download a GoogleDocs natively formatted file, these are always stored in the cloud on Google's servers, and the Save as/Download procedure provides a different format in which to save the file to your hard drive ;

- some other, as yet unidentified problem, possibly an incomplete Google export filter, which would not fall under the LibreOffice bugzilla.


Please provide more specific details as to how the document with which you are having problems was produced.
Comment 3 Alex Thurgood 2017-01-06 09:14:03 UTC
I have opened the document in Word () for OSX - what are the text images to which you refer ?
Comment 4 Les 2017-01-06 20:22:38 UTC
@Alex,

Thanks for the update. I don't know how the document was produced, but the beginning of it says it went live as a Google Doc - which is why I assumed it was produced by that app. I don't have MS Word, so I can't verify how it looks there.

It sounds like it probably is either a duplicate of some other conversion bug, or something new. Since I'm not sure what terms to search for to determine it's a duplicate or new, I have to leave that to you guys.

For the "images" I was referring to, I'm attaching 2 screenshots. One is from Google Doc and the other from LibreOffice. The "arrowhead" is an image and the text it points to appears to be an inserted text box.

Les
Comment 5 Les 2017-01-06 20:23:42 UTC
Created attachment 130218 [details]
Screenshot of doc in Google Doc to show images
Comment 6 Les 2017-01-06 20:24:14 UTC
Created attachment 130219 [details]
LibreOffice screenshot dealing with images in doc.
Comment 7 Les 2017-01-06 20:25:17 UTC
@Alex, it's possible the images issue is just a formatting issue with the conversion.
Comment 8 Alex Thurgood 2017-01-09 16:24:14 UTC
Created attachment 130278 [details]
Screenshot in Word for OSX 15.29.1

This is a screenshot of the double arrow and text box formatting as displayed in the latest version of Word for OSX
Comment 9 Alex Thurgood 2017-01-09 16:29:40 UTC
When you compare the three screenshots you can see that all have different spacing between the double arrows and the associated text box.

Although LibreOffice clearly sets the arrows too close to the text, there is nonetheless a difference between the GoogleDocs display and the Word docx display.

It is hard to blame LibreOffice in such a case. After all, if GoogleDocs saves the file with some weird spacing parameter, who is to say that LibreOffice is misinterpreting it, indeed, the same could be said of Microsoft Word.
Comment 10 Alex Thurgood 2017-01-09 16:31:28 UTC
I am confirming nonetheless, as it does appear that LibreOffice incorrectly positions the image too closely to the text box object.
Comment 11 QA Administrators 2018-01-10 03:32:03 UTC Comment hidden (obsolete)
Comment 12 eisa01 2018-06-14 19:26:38 UTC
This has gotten worse. The document renders over 54 instead of 26 pages in Mac Word.

The screenshot of the rendering on the second page seems to be somehow rendered over the first ~20 pages?

Also applies to Windows

Version: 6.2.0.0.alpha0+
Build ID: b292a27698e85fd9d60c03613c3b0c67835c4dc1
CPU threads: 2; OS: Mac OS X 10.12.6; UI render: default; 
TinderBox: MacOSX-x86_64@49-TDF, Branch:master, Time: 2018-06-06_23:25:55
Locale: en-US (en_US.UTF-8); Calc: group threaded
Comment 13 QA Administrators 2019-06-15 02:59:24 UTC Comment hidden (obsolete)
Comment 14 eisa01 2019-08-10 19:58:36 UTC
Still present, now a 50 page document

Version: 6.4.0.0.alpha0+
Build ID: 54028dc503fc08eb12e287919d5e2850cff05b73
CPU threads: 4; OS: Mac OS X 10.14.6; UI render: default; VCL: osx; 
TinderBox: MacOSX-x86_64@49-TDF, Branch:master, Time: 2019-07-31_01:48:19
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded
Comment 15 Commit Notification 2020-10-30 17:01:52 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/b3e210808247743c891350dded33eb6e186c1088

crashtesting: assert on undo creation on export of tdf105134-1.docx to pdf

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 NISZ LibreOffice Team 2020-12-12 11:27:26 UTC
Created attachment 168096 [details]
Minimized example file

This example (2 pages in Word 2013, 8 in Writer) shows better what happens in this file:
- there is a first page with a huge top margin - originally 26 cm (I reduced this to 24 in my example), together with the bottom one leaving only a tiny space for the document body.
- the document body on this page contains only one line and a continuous section break
- the latter is imported incorrectly, making the huge top margin extend to the second page. Since this leaves very tiny space to display all the text on the second page, a lot of new pages are inserted in Writer.
- the page break at the end of the second page is imported correctly, after that it's mostly okay.

This is still a problem in:
Version: 7.2.0.0.alpha0+ (x64)
Build ID: 61d07657caab5e0fb8ec4446f67a7044e14dae4b
CPU threads: 4; OS: Windows 6.3 Build 9600; UI render: Skia/Raster; VCL: win
Locale: hu-HU (hu_HU); UI: en-US
Calc: CL
Comment 17 NISZ LibreOffice Team 2020-12-12 11:28:47 UTC
Created attachment 168097 [details]
Screenshot of the minimized document in Word and Writer

This looks like this since:

https://cgit.freedesktop.org/libreoffice/core/commit/?id=50bf96d31ab2eb546f6c71cc93c1fa5dd4bf3044

author	Justin Luth <justin_luth@sil.org>	2016-06-25 22:21:08 +0300
committer	Miklos Vajna <vmiklos@collabora.co.uk>	2016-06-29 07:35:54 +0000

tdf#90697 docx - don't change continuous break into page break
Comment 18 NISZ LibreOffice Team 2020-12-12 11:30:25 UTC
Adding CC to: Justin Luth
Comment 19 Justin L 2020-12-14 08:50:24 UTC
I think this document was intentionally designed to show off how incapable LO is of handling continuous breaks.

The first page (which is completely empty) defines a massive top margin. Then it does NOT do a Section Page-break, but just a continuous break with not enough space for the next heading/paragraph orphan - which forces it to start on the second page. How awful of a design is that.

So, there is NO WAY for LibreOffice to know when the new page style should be applied. It is just "whenever a new page starts", and LO has no corresponding thing for that.

As a human, it is easy for us to say - well this is a continuous section on the first paragraph, so we could make it a first/follow pair. But what happens if the first paragraph is large enough to cover multiple pages?  Or what if there are multiple continuous breaks on the first page? That would break any "emulation" in the extremer cases. Of course, we could argue to emulate for the more common cases, so perhaps we could improve some more common documents at the expense of breaking more extreme examples.  (Although I would hate to classify this document as a common document...)

So, I would say the proper fix for this document is for the author to replace the continuous section break with a section-pagebreak.  (And that is the solution for pretty much every use of continuous section breaks.)

*** This bug has been marked as a duplicate of bug 89297 ***
Comment 20 Commit Notification 2021-06-22 11:07:57 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/c5d984b840db595796051bc2bf37d1c2179157e3

crashtesting: assert on export of tdf105134-1.docx to odt

It will be available in 7.3.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2021-06-22 15:02:29 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-7-2":

https://git.libreoffice.org/core/commit/25d253d3bf9ad822a967e7d78caed70a6ce6544c

crashtesting: assert on export of tdf105134-1.docx to odt

It will be available in 7.2.0.0.beta2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.