Bug 75208 - Fileopen DOCX: text boxes imported in frame with wrong paragraph style (should be normal, not frame contents) (see comment 9)
Summary: Fileopen DOCX: text boxes imported in frame with wrong paragraph style (shoul...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3.0.4 release
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:docx
Depends on:
Blocks: DOCX-Textbox VML-Textbox
  Show dependency treegraph
 
Reported: 2014-02-19 14:17 UTC by Gorka Navarrete
Modified: 2023-06-08 16:42 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
complex .docx document with lots of formatting issues when opened in LO (770.76 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2014-02-19 14:17 UTC, Gorka Navarrete
Details
.DOCX resaved in MSO (554.11 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-01-09 15:39 UTC, Timur
Details
Minimal 3 pages .docx saved in MSO (71.45 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-01-09 16:12 UTC, Timur
Details
Minimal 3 pages .docx saved in MSO as PDF (454.64 KB, application/pdf)
2020-01-09 16:13 UTC, Timur
Details
Minimized test document in docx format with VML textbox (32.26 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-11-23 20:23 UTC, NISZ LibreOffice Team
Details
The minimized example file in Word 2013 and Writer (123.06 KB, image/png)
2020-11-23 20:24 UTC, NISZ LibreOffice Team
Details
Even more minimized sample with DML format textbox (116.96 KB, image/png)
2020-11-23 20:24 UTC, NISZ LibreOffice Team
Details
The 3-page sample minimized to 1 textbox (34.76 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-11-23 20:25 UTC, NISZ LibreOffice Team
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Gorka Navarrete 2014-02-19 14:17:07 UTC
Created attachment 94362 [details]
complex .docx document with lots of formatting issues when opened in LO

With every new version of OO first and then LO I've been testing how my 2009 docx PhD thesis looked. Back in the day the formatting problems were horrendous. Now it is more bearable but still a long shot from perfect interoperativity.

I am not sure if this may be of help but decided to attach the document as an example of a real world complex docx file in which it becomes impossible to work switching from MS Office and LO, something necessary in collaborative environments as academia.

Hopefully it can be used as a test-case to help nail down all the formatting bugs existent.

To replicate, simply open the file in MS Office 2007 and then in LO 4.2. Most problems are focused on figures not showing, text frames now showing all the content (e.g. search for "Cuadro 2"), etc.

If needed I can try to list the formatting problems thoroughly.
Comment 1 Thomas van der Meulen [retired] 2014-03-23 17:05:40 UTC
Thank you for your bug report, I can reproduce this bug running 
Version: 4.3.0.0.alpha0+
Build ID: 1a67b7cc3d5dc3dcd0de0e247f638c33d57dea1b
TinderBox: MacOSX-x86@49-TDF, Branch:master, Time: 2014-03-23_05:59:09
OS: Mac osx 10.9.2


I have compared it with Microsoft word 2007 on Windows 7

Because of the problems the page number are the pages on libreoffice.
Problems that I have found:
-no smart-art on page 33, 80, 81, 99, 100, 113, 124,142
-Text box is wrong on page 38, 46, 48, 75, 80, 81, 83, 86, 101, 105, 109, 120, 124, 145, 147
-on page 56,62, 67, 88, 95/96 the white/not filled dots are placed wrong
- page counter in footer is missing
Comment 2 A (Andy) 2015-12-27 20:05:19 UTC
Reproducible with LO 5.1.0.1, Win 8.1

Missing page counter in the footer, missing smart-arts (e.g. page 33), different length of the document (164 vs 174 pages), wrong text boxes (e.g. page 76).
Comment 3 QA Administrators 2018-05-31 02:52:36 UTC Comment hidden (obsolete)
Comment 4 Roman Kuznetsov 2020-01-09 14:57:39 UTC
still repro in

Version: 6.5.0.0.alpha0+ (x64)
Build ID: 2d736e1a0a2bbd41fe7793d52bbcc7bfc89c7da3
CPU threads: 4; OS: Windows 10.0 Build 18362; UI render: default; VCL: win; 
Locale: ru-RU (ru_RU); UI-Language: en-US
Calc: threaded
Comment 5 Timur 2020-01-09 15:39:42 UTC
Created attachment 157040 [details]
.DOCX resaved in MSO

Sample .docx is 2007 version. Here is .docx resaved in Word.

This was wrong bug report, because rule is one issue per bug, and even that after search for duplicates.
Comment 6 Timur 2020-01-09 16:02:16 UTC
(In reply to Thomas van der Meulen from comment #1)
> Problems that I have found:
> -no smart-art on page 33, 80, 81, 99, 100, 113, 124,142
Smart-art on page 33 present in new DOCX, with some distortion. Already known issue that it's not read from 2007 format. Needs search for distortion issue. 
80 and 81 no smart-art in Word, that's page 83, same as previous. Etc.


> -Text box is wrong on page 38, 46, 48, 75, 80, 81, 83, 86, 101, 105, 109,
> 120, 124, 145, 147
Page 38 nothing, page 39: seen in new DOCX, wrong spacing. Needs search. 
From 83 seen in new DOCX. 1st wrong spacing, 2nd empty. Needs search. 


> -on page 56,62, 67, 88, 95/96 the white/not filled dots are placed wrong
Not clear.


> - page counter in footer is missing
Not in new DOCX.


So, main issues are empty text boxes, wrong spacing in some and smaller issues with smart art.
If issue not found in search, report needs a minimal sample DOCX created in MSO.


"different length of the document (164 vs 174 pages)" no need to report, cannot be fixed like that, it's text engine and rendering.
Comment 7 Timur 2020-01-09 16:12:27 UTC
Created attachment 157041 [details]
Minimal 3 pages .docx saved in MSO

Here is minimal 3 pages .docx (around original page 83) saved in MSO for the issue of text boxes.
And here things get strange.
Text boxes that were empty when opened from full new DOCX saved in MSO are not empty in this minimal .docx. 
Just issue with spacing.
Comment 8 Timur 2020-01-09 16:13:02 UTC
Created attachment 157042 [details]
Minimal 3 pages .docx saved in MSO as PDF
Comment 9 NISZ LibreOffice Team 2020-11-23 20:23:43 UTC
Created attachment 167513 [details]
Minimized test document in docx format with VML textbox

Since the SmartArt looks now okay, let's focus on the texbox spacing issue. 
(I thought we already caught all these...)
Anyways, this has only one textbox from the original document in VML format, no direct formatting only Normal style text with 0 cm before/after paragraph spacing.

For some reason this gets imported as Frame Contents style which has 0.21 cm after paragraph spacing and this causes the text not to fit in the textbox anymore.

Attachment #157041 [details] is converted to DML format but the same style change happens there too.

In older versions the same is happening all the way back to:

LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735
Comment 10 NISZ LibreOffice Team 2020-11-23 20:24:12 UTC
Created attachment 167514 [details]
The minimized example file in Word 2013 and Writer
Comment 11 NISZ LibreOffice Team 2020-11-23 20:24:53 UTC
Created attachment 167515 [details]
Even more minimized sample with DML format textbox
Comment 12 NISZ LibreOffice Team 2020-11-23 20:25:46 UTC
Created attachment 167516 [details]
The 3-page sample minimized to 1 textbox
Comment 13 Justin L 2023-06-08 16:42:38 UTC
repro 7.6