Bug 58239 - FILEOPEN: Importing DOCX document gives wrong text box placement and page break
Summary: FILEOPEN: Importing DOCX document gives wrong text box placement and page break
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.6.2.2 release
Hardware: All All
: low trivial
Assignee: Not Assigned
URL:
Whiteboard: BSA
Keywords: filter:docx
Depends on:
Blocks: DOCX-Textbox DOCX-Grouped-Shapes VML-Textbox
  Show dependency treegraph
 
Reported: 2012-12-13 11:31 UTC by marius
Modified: 2022-09-24 12:59 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
Document that is bad imported. (51.16 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2012-12-13 11:31 UTC, marius
Details
PDF of correctly displayed document (59.03 KB, application/pdf)
2012-12-13 11:32 UTC, marius
Details
Screenshot showing the red triangle overlays when file converted and opened as DOC (57.90 KB, image/png)
2016-08-09 14:25 UTC, Johnny_M
Details
The test DOCX file converted to DOC using Word 2003 (with compatibility pack) (123.00 KB, application/msword)
2016-08-09 14:32 UTC, Johnny_M
Details
Document compared MSO LO (144.50 KB, image/png)
2020-09-15 11:13 UTC, Timur
Details
Minimized test document in docx format (20.32 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-09-18 13:58 UTC, NISZ LibreOffice Team
Details
The minimized example file in Word and Writer (52.94 KB, image/png)
2020-09-18 14:00 UTC, NISZ LibreOffice Team
Details

Note You need to log in before you can comment on or make changes to this bug.
Description marius 2012-12-13 11:31:07 UTC
Created attachment 71442 [details]
Document that is bad imported.

Problem description: 

Steps to reproduce:
1. Open document

Current behavior:
Bad interpretation of document.


Expected behavior:
The document should look like the PDF attached.
              
Operating System: Ubuntu
Comment 1 marius 2012-12-13 11:32:07 UTC
Created attachment 71443 [details]
PDF of correctly displayed document
Comment 2 A (Andy) 2012-12-15 10:08:13 UTC
reproducible with LO 3.6.4.3 (Win7 Home, 64 Bit)
Comment 3 VX 2012-12-15 12:09:27 UTC
I can confirm it for LO Writter 4.0.0.0 beta 1 on Windows 7 64 bit Polish.
Comment 4 Samuel Mehrbrodt (allotropia) 2014-01-23 14:10:55 UTC
Confirmed in 4.2.
Comment 5 Joel Madero 2015-05-02 15:43:06 UTC Comment hidden (obsolete)
Comment 6 Gordo 2015-06-20 15:13:40 UTC
Still reproducible.

Windows Vista 64
Version: 4.4.4.2
Build ID: f784c932ccfd756d01b70b6bb5e09ff62e1b3285

Version: 5.1.0.0.alpha1+
Build ID: 46564fd97308ce070248482ad65a311a329a2b76
TinderBox: Win-x86@39, Branch:master, Time: 2015-06-15_00:08:53
Comment 7 Johnny_M 2016-08-09 14:08:29 UTC
Confirming on:
- Windows Vista, 32 bit:
Version: 5.2.0.4
Build-ID: 066b007f5ebcc236395c7d282ba488bca6720265
CPU-Threads: 2; BS-Version: Windows 6.0; UI-Render: Standard; 
Gebietsschema: de-DE (de_DE)

- Linux Mint 17.1, 64 bit:
Version: 5.1.5.2
Build ID: 1:5.1.5~rc2-0ubuntu1~trusty1
CPU Threads: 2; OS Version: Linux 3.13; UI Render: default; 
Locale: de-DE (en_GB.UTF-8); Calc: group


BUT: Those aren't "checkboxes" (as in a form control). They are actually small text boxes, also when opened in MS Word 2003 (with DOCX compatibility pack).

If the file is saved as a DOC file by Word 2003, the text boxes are ordered correctly when the latter is opened in LO 5.2.0.4, but most have a red triangle overlay.
Comment 8 Johnny_M 2016-08-09 14:25:03 UTC
Created attachment 126701 [details]
Screenshot showing the red triangle overlays when file converted and opened as DOC

(In reply to Johnny_M from comment #7)
> [...]
> If the file is saved as a DOC file by Word 2003, the text boxes are ordered
> correctly when the latter is opened in LO 5.2.0.4, but most have a red
> triangle overlay.

The attached screenshot shows the red triangle overlays. Although those are not shown in the print preview, nor printed or exported to PDF.
Comment 9 Johnny_M 2016-08-09 14:32:29 UTC
Created attachment 126702 [details]
The test DOCX file converted to DOC using Word 2003 (with compatibility pack)
Comment 10 QA Administrators 2017-12-08 08:08:16 UTC Comment hidden (obsolete)
Comment 11 Johnny_M 2017-12-27 11:35:22 UTC Comment hidden (obsolete)
Comment 12 QA Administrators 2018-12-28 03:46:49 UTC Comment hidden (obsolete)
Comment 13 Johnny_M 2019-07-14 16:19:05 UTC
Still reproducible on the initial machine with current master on Ubuntu MATE 19.10 liveUSB:
Version: 6.4.0.0.alpha0+
Build ID: c54597a8905b07807952aebc24237549302fb941
CPU threads: 4; OS: Linux 5.0; UI render: default; VCL: gtk3; 
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2019-07-10_22:22:02
Locale: en-US (C.UTF-8); UI-Language: en-US
Calc: threaded
Comment 14 NISZ LibreOffice Team 2020-09-08 14:00:23 UTC
This has VML textboxes that are converted to frames. 
Also the original "small squares" actually contain a few spaces, tabs and images of two small squares. These images also do outflow from the frames.

All this extra content makes Writer display red triangles that warn about hidden content.
Comment 15 Timur 2020-09-15 11:13:59 UTC
Created attachment 165517 [details]
Document compared MSO LO

"Bad document" is bad bug report. Needs a single issue. 
Seems like it's focused on red triangles on 1st page which is trivial issue, explained in comment 8. 
Similar in LO 7.1+ with this 2007 DOCX and if resaved in MSO. 

Note: More serious is page break on 2nd page, needs to be checked if already reported, we had similar reports. Can be put in See Also if found.
Comment 16 NISZ LibreOffice Team 2020-09-18 13:58:20 UTC
Created attachment 165665 [details]
Minimized test document in docx format

Okay, let's reduce the problem. 
This has only one VML shape with more content (spaces, tabs, image) inside that it can fit.
Comment 17 NISZ LibreOffice Team 2020-09-18 14:00:40 UTC
Created attachment 165666 [details]
The minimized example file in Word and Writer
Comment 18 QA Administrators 2022-09-19 03:33:03 UTC Comment hidden (obsolete)
Comment 19 Johnny_M 2022-09-24 12:59:13 UTC
The original issue with moved text boxes/frames and their insertions (spaces, square images, etc.), as well as the page break after the first table row on page 2, persists.

Although the display of the red triangles is not a bug, but a warning feature showing that not all is being visible within the text boxes/frames, as explained in comment 14. Therefore, reverting the previous bug title change.

Tested on Ubuntu 22.04 liveUSB:
Version: 7.5.0.0.alpha0+ / LibreOffice Community
Build ID: 9940a5dce79fe9dc3e6ff0302c9be8c7d1648f67
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded