Bug 87764 - FILEOPEN: DOC - Section columns incorrectly imported around page columns
Summary: FILEOPEN: DOC - Section columns incorrectly imported around page columns
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3.4.1 release
Hardware: Other All
: high major
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:doc, regression
: 95026 118927 123337 (view as bug list)
Depends on:
Blocks: Page-Layout-Columns DOC-Page
  Show dependency treegraph
 
Reported: 2014-12-27 09:55 UTC by FMJ Vezelay
Modified: 2020-01-28 19:35 UTC (History)
17 users (show)

See Also:
Crash report or crash signature:


Attachments
Text file made with scan / OCR process, .doc. (567.00 KB, application/msword)
2014-12-27 09:55 UTC, FMJ Vezelay
Details
sample1 (171.50 KB, application/msword)
2016-12-01 21:58 UTC, Xisco Faulí
Details
sample2 (359.00 KB, application/msword)
2016-12-01 22:00 UTC, Xisco Faulí
Details
sample3 (141.00 KB, application/msword)
2016-12-01 22:08 UTC, Xisco Faulí
Details
sample4 (118.90 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2016-12-01 22:12 UTC, Xisco Faulí
Details
sample5 (224.00 KB, application/msword)
2016-12-01 22:15 UTC, Xisco Faulí
Details
smaller sample (88.00 KB, application/wps-office.doc)
2017-08-01 12:01 UTC, Yousuf Philips (jay) (retired)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description FMJ Vezelay 2014-12-27 09:55:58 UTC
Created attachment 111394 [details]
Text file made with scan / OCR process, .doc.

Hello,
I try to open the .doc attached file, with ocr extracts; it may be have done with Microsoft Word. LibreOffice generate many pages, in thin columns and then it blocks!
Apache Ooo manage to open it quite correctly, why LibreOffice cannot ?
Comment 1 Robinson Tryon (qubit) 2014-12-27 14:49:04 UTC
Comment on attachment 111394 [details]
Text file made with scan / OCR process, .doc.

fix mimetype
Comment 2 Robinson Tryon (qubit) 2014-12-27 15:00:49 UTC
(In reply to FMJ Vezelay from comment #0)
> I try to open the .doc (attachment 111394 [details]), with ocr extracts; 

Working from an OCR source can be always be challenging...

> it may be have done
> with Microsoft Word. LibreOffice generate many pages, in thin columns and
> then it blocks!

TESTING on Ubuntu 14.04:

In LO 4.4.0.1, I see a large number of pages (~180), and lots of content in thin columns
In LO 3.5.7.2, I see ~109 pages, many with content in 2 columns in landscape mode.

In both, the document definitely makes LibreOffice run slowly. The layout in 3.5 looks A LOT better, so I'm going to tag this as a regression.

Keywords -> regression
Whiteboard -> bibisectRequest
Status -> NEW
Comment 3 Yousuf Philips (jay) (retired) 2014-12-27 18:52:28 UTC
The thin column problem happened in 4.3 as 4.2.6 is fine and it has 92 pages when you open page preview after repagination.

Version: 4.2.6.2
Build ID: 185f2ce4dcc34af9bd97dec29e6d42c39557298f
Comment 4 Robinson Tryon (qubit) 2014-12-27 19:21:46 UTC
(In reply to Jay Philips from comment #3)
> The thin column problem happened in 4.3 as 4.2.6 is fine and it has 92 pages
> when you open page preview after repagination.

I just tested LO 4.3.2.2, and it's got the same 4 skinny columns problems. Narrowing-down on the problem, but I'm cc'ing the (bi)bisect maestro here to figure out what went wrong. If we work quickly, we can take this bug from opening -> commit identified in about one day ;-)
Comment 5 Matthew Francis 2014-12-28 02:26:37 UTC
Bibisect results from 43all and 44:
In the course of history this has been broken, fixed and then broken again, as summarised below


43all: Broken at
[8a2068ec09e531c6943ef0f090bd02a1cab565b7] source-hash-5218c0d6a8171400bee0d972ff05757849df4d19

43all: Fixed at
[251dbe932a666e83c91816fcf755a4c3be51e078] source-hash-fff4d120866a0be3cd8185f2c67bb9f59b1a6a3f

44: Broken at
[626531d9052fe067359170d41bd943b59766b551] source-hash-3d3401a6397e893808309ec374f5d8f890144906
Comment 6 Matthew Francis 2014-12-28 03:26:48 UTC
The most recent breakage of the attached file seems to have appeared at the below commit

Adding a Cc: to l.lunak@collabora.com. Could you shed any light on what's going on with this bug? Thanks


commit c5ed52b1cd6f22787c94bec035ceecf9e1da3271
Author: Luboš Luňák <l.lunak@collabora.com>
Date:   Mon Jul 21 10:56:52 2014 +0200

    ww8import create a pagedesc if continuous section changes margins (bnc#875383)
    
    This is similar to what writerfilter does. MSWord can have one page with several
    different margins, which are saved using continuous sections, which causes all
    kinds of trouble, because either we treat them as Writer sections, which means
    we lose some of the data, or we treat them as Writer page styles, which causes
    spurious page breaks if in the wrong place. Either option has its problems, but
    here it seems slightly better to go for keeping the data and hoping the page
    break will be in a place where a break will be anyway.
    
    Change-Id: I8f52aa820750da6788ea04180a15ac334f6bf87b
Comment 7 Timur 2015-10-06 14:32:25 UTC
Same problem version 4.3.1.1, same commit as Bug 86468. Looks like a dupe.
Comment 8 Robinson Tryon (qubit) 2015-12-13 11:11:10 UTC Comment hidden (obsolete)
Comment 9 Xisco Faulí 2016-09-24 14:52:48 UTC
*** Bug 95026 has been marked as a duplicate of this bug. ***
Comment 10 Xisco Faulí 2016-11-19 13:26:19 UTC
Another document affected by the same commit : attachment 42312 [details]
Comment 11 Telesto 2016-11-25 11:06:47 UTC
Confirming with:
Versie: 5.3.0.0.beta1 
Build ID: 690f553ecb3efd19143acbf01f3af4e289e94536
CPU Threads: 4; Versie besturingssysteem:Windows 6.2; UI Render: standaard; Layout Engine: new; 
Locale: nl-NL (nl_NL); Calc: CL
Comment 12 Xisco Faulí 2016-12-01 21:58:06 UTC
Created attachment 129206 [details]
sample1

another file affected by the same commit where the content inside the frame are shifted to the next page
Comment 13 Xisco Faulí 2016-12-01 22:00:58 UTC
Created attachment 129207 [details]
sample2

another file affected by the same commit
Comment 14 Xisco Faulí 2016-12-01 22:08:11 UTC
Created attachment 129208 [details]
sample3

another one
Comment 15 Xisco Faulí 2016-12-01 22:12:58 UTC
Created attachment 129209 [details]
sample4

another one...
Comment 16 Xisco Faulí 2016-12-01 22:15:54 UTC
Created attachment 129210 [details]
sample5

another one...
Comment 17 Justin L 2016-12-09 11:51:58 UTC
another one: rdown_2.doc (attachment 57985 [details]) from bug 46941
Comment 18 Xisco Faulí 2017-01-12 14:44:39 UTC
*** Bug 105285 has been marked as a duplicate of this bug. ***
Comment 19 Xisco Faulí 2017-07-31 23:09:15 UTC
*** Bug 110432 has been marked as a duplicate of this bug. ***
Comment 20 Yousuf Philips (jay) (retired) 2017-08-01 12:01:18 UTC
Created attachment 135046 [details]
smaller sample

The first 10 pages of attachment 111394 [details], as opening attachment 111394 [details] hammers the CPU as the page numbers continue to increase.
Comment 21 Chen-Ku 2018-11-29 08:19:50 UTC
Still exists in version:
版本:6.1.3.2 (x64)
組建 ID:86daf60bf00efa86ad547e59e09d6bb77c699acb
CPU 執行緒:12; OS:Windows 10.0; UI 算繪:GL; 
語言地區:zh-TW (zh_TW); Calc: CL
Comment 22 Chen-Ku 2018-11-29 08:23:14 UTC
Still exists in version:
Version: 6.3.0.0.alpha0+ (x64)
Build ID: 0f25a3c36f27fd51453b9a9115f236b83c143684
CPU threads: 12; OS: Windows 10.0; UI render: GL; VCL: win; 
TinderBox: Win-x86_64@42, Branch:master, Time: 2018-11-27_20:06:55
Locale: zh-TW (zh_TW); UI-Language: en-US
Calc: threaded
Comment 23 Chen-Ku 2018-11-29 08:23:26 UTC Comment hidden (obsolete)
Comment 24 Xisco Faulí 2019-11-05 11:59:37 UTC
*** Bug 128605 has been marked as a duplicate of this bug. ***
Comment 25 Xisco Faulí 2019-11-05 12:19:02 UTC
*** Bug 123337 has been marked as a duplicate of this bug. ***
Comment 26 Xisco Faulí 2019-11-05 12:19:23 UTC
*** Bug 118927 has been marked as a duplicate of this bug. ***
Comment 27 Xisco Faulí 2019-11-05 12:35:59 UTC
Even the testcase is incorrect. In LibreOffice it has 2 pages, in MSO Word 2010 it has 1
Comment 28 Xisco Faulí 2019-11-05 14:42:34 UTC
@Justin Luth, @Miklos, I thought you might be interested in this issue, considering the number of duplicates...
Comment 29 Miklos Vajna 2019-11-25 13:29:03 UTC
It seems to me, Lubos described the feature that is missing here correctly.

Word has this feature that you can attach different page margins to continuous section breaks and then let layout decide which is the first on a given page, and use that for the actual page margin. This is not something Writer has at its core today.

Till that is added, I would say the only sane thing to do is to make sure that the DOC, DOCX and RTF behavior is the same.
Comment 30 Justin L 2019-12-02 17:14:19 UTC
(In reply to Miklos Vajna from comment #29)
> Till that is added, I would say the only sane thing to do is to make sure
> that the DOC, DOCX and RTF behavior is the same.
Actually, I like having different behaviour in this case. Since we simply can't do what Word does, it is nice to have two different ways to save a compatible-format file. That way on a per-document basis you could choose the one that works better in that case.