Bug 77417 - FILEOPEN: incorrect conversion of docx paragraph spacing details (sample in Comment 7)
Summary: FILEOPEN: incorrect conversion of docx paragraph spacing details (sample in C...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.3.3 release
Hardware: Other All
: medium normal
Assignee: László Németh
URL:
Whiteboard: BSA
Keywords: filter:docx
Depends on:
Blocks: DOCX-Paragraph DOCX-Styles DOCX-compatibilityMode-15
  Show dependency treegraph
 
Reported: 2014-04-14 06:36 UTC by Yousuf Philips (jay) (retired)
Modified: 2020-06-09 10:47 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
the file specified in the long description, along with a version of the file converted using the ms office compatibility pack (2.39 MB, application/x-gzip)
2014-04-14 06:36 UTC, Yousuf Philips (jay) (retired)
Details
incorrect paragraph styles when opening DOCX file in LibO (346.59 KB, application/zip)
2014-07-07 08:17 UTC, Gergely Rácz
Details
Another 4-pages sample DOCX from the report, saved in MSO 2013 (33.13 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-01-24 12:35 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yousuf Philips (jay) (retired) 2014-04-14 06:36:00 UTC
Created attachment 97329 [details]
the file specified in the long description, along with a version of the file converted using the ms office compatibility pack

Problem description: 
Double the amount of spacing is found between paragraphs, as extra spacing is added above each paragraph

Steps to reproduce:
1. download docx file - http://www.microsoft.com/investor/reports/ar13/docs/2013_Annual_Report.docx
2. Goto the first page with text

Current behavior:
The conversion has the font style as 0.00" spacing above and 0.19" spacing below the paragraph, while the line paragraph details has 0.19" spacing above an 0.19" spacing below. The document also has similar paragraph spacing problems in all text in tables, which causes the document to increase from its initial size of around 90, to over 300.
Operating System: Ubuntu
Version: 4.2.2.1 release
Comment 1 Yousuf Philips (jay) (retired) 2014-04-15 06:38:48 UTC
as i my version number wasnt listed in the drop down, here is the correct version info from Help > About.

Version: 4.2.3.3
Build ID: 882f8a0a489bc99a9e60c7905a60226254cb6ff0
Comment 2 Robinson Tryon (qubit) 2014-04-20 22:23:03 UTC
(In reply to comment #0)
> 
> Problem description: 
> Double the amount of spacing is found between paragraphs, as extra spacing
> is added above each paragraph

RESULT: Tentative repro

> 
> Steps to reproduce:
> 1. download docx file -
> http://www.microsoft.com/investor/reports/ar13/docs/2013_Annual_Report.docx
> 2. Goto the first page with text
> 
> Current behavior:
> The conversion has the font style as 0.00" spacing above and 0.19" spacing
> below the paragraph, while the line paragraph details has 0.19" spacing
> above an 0.19" spacing below.

If I put my cursor on the word "Microsoft" in the paragraph "Fiscal Year 2013 was a pivotal year for Microsoft in every sense of the word.", I can right-click and select Paragraph -> Indents & Spacing, and see that Spacing Above paragraph is 0.19" and Spacing Below paragraph is 0.19".

If press the Escape key and right-click again on the same word, I can select Edit Paragraph Style... ->  Paragraph -> Indents & Spacing, and see that Spacing Above paragraph is 0.00" and Spacing Below paragraph is 0.19".  

I'm not sure why the values for "Spacing Above paragraph" differ here, but if the document is supposed to be 90 pages and is blossoming to ~ 300, then something is definitely wrong.

Testing on Ubuntu 12.04.4 + LO 4.2.3.3

Status -> NEW
Comment 3 Gergely Rácz 2014-07-07 08:15:45 UTC
Environment: 
OS: Win7
Libo: 4.2.5.2

I have a .DOCX file with "Normal(Web)" paragraph style.
Details of the style in MS Office:
- Spacing: Auto (before and after)
- Line Spacing: 1,0 (Normal)

When I open this file with LibO, I will have different settings on the document (under one paragraph style: "Normal(Web):

1. 
- Spacing: Above: 0cm; Below: 0,35cm
- Line spacing: Proportional 115%

2. 
- Spacing: Above: 0,49cm; Below: 0,49cm
- Line spacing: Proportional 115%

If it is okay I have uploaded the sample file for you with printscreens.
The file name is: test_gracz.zip
Comment 4 Gergely Rácz 2014-07-07 08:17:59 UTC
Created attachment 102353 [details]
incorrect paragraph styles when opening DOCX file in LibO
Comment 5 Timur 2015-09-30 15:08:34 UTC
Currently, LO opens 130 pages. Text looks fine, and the primary problem is text in tables. Seems like spacing numbers are the same, but text simply doesn't fit.
Comment 6 Xisco Faulí 2016-10-10 17:37:21 UTC
Hi Yousuf,
this looks like a duplicate of bug 95031...

*** This bug has been marked as a duplicate of bug 95031 ***
Comment 7 Timur 2020-01-20 12:03:15 UTC
Minimized sample is DOCX attachment 119325 [details].
I extracted only page 7 for an example. That page doesn't fit on a single page in LO, because text in table cannot fit.
Not resolved with bug 94801 so I set New.
Comment 8 László Németh 2020-01-22 19:02:59 UTC
Fixed in https://gerrit.libreoffice.org/plugins/gitiles/core/+/0c84c60f48cf681daf467c0678a768711f22e5c3%5E%21

(https://gerrit.libreoffice.org/c/core/+/87136)

(Unfortunately, with bad commit id in the commit description:

"tdf#77419 DOCX table import: ignore right white space

in table paragraphs in MSO 2010 compatibility mode.")
Comment 9 Timur 2020-01-24 12:35:42 UTC
Created attachment 157397 [details]
Another 4-pages sample DOCX from the report, saved in MSO 2013

Looking good in LO 7.0+ for minimized sample DOCX attachment 119325 [details]. Definite improvement. 

Original DOCX from zipped attachment 97329 [details] is still different in MSO and LO.
Table FINANCIAL HIGHLIGHTS on page 6 of 91 in MSO looks better in LO.
Tables QUARTERLY STOCK PRICE and SHARE REPURCHASES AND DIVIDENDS are different in LO in the last column Fiscal Year and Amount.
Tables Dividends on original page 8 in MSO are still two-line in LO for columns Record Date and Payment Date.

To see more easily, I made in MSO another 4-pages sample DOCX from the report.
Some rows were two-line in MSO, so I adjusted all to be single-line.
LO still opens some as two-line.
Please see this example. Thanks.
Comment 10 Timur 2020-01-29 09:27:02 UTC
DOCX attachment 119325 [details] was prepared in MSO 2010 and fix is limited to 2010 mode. 
DOCX attachment 157397 [details] was saved in MSO 2013 and is still wrong. 

László, can you please explain if solution for 2013 is similar and should be in this bug (which would make sense for the same issue). 
Or if it's different enough for another bug.
Comment 11 NISZ LibreOffice Team 2020-01-31 10:09:46 UTC
(In reply to Timur from comment #9)
> Table FINANCIAL HIGHLIGHTS on page 6 of 91 in MSO looks better in LO.
> Tables QUARTERLY STOCK PRICE and SHARE REPURCHASES AND DIVIDENDS are
> different in LO in the last column Fiscal Year and Amount.
> Tables Dividends on original page 8 in MSO are still two-line in LO for
> columns Record Date and Payment Date.
> 
There is no more rounding error (that I can see) but there is a TAB character after the values that is not rendered by Word but is rendered by Writer.

This also happens in the RESULTS OF OPERATIONS tables "Percentage
Change 2013
Versus 2012 " column: There is a TAB character in strings like "(380	)" which is not rendered by Word (but enter a few more characters to have it longer than one row...), but is rendered by Writer.


> To see more easily, I made in MSO another 4-pages sample DOCX from the
> report.
> Some rows were two-line in MSO, so I adjusted all to be single-line.
> LO still opens some as two-line.
> Please see this example. Thanks.

I see in the first table FINANCIAL HIGHLIGHTS under 2012 column that the After Text indent is 0.45 cm in Writer, but 0.44 cm in Word. Manually adjusting this somehow fixes the line break problem.

I'd suggest to open new bugs for the problems above and let this one rest.