Created attachment 111394 [details] Text file made with scan / OCR process, .doc. Hello, I try to open the .doc attached file, with ocr extracts; it may be have done with Microsoft Word. LibreOffice generate many pages, in thin columns and then it blocks! Apache Ooo manage to open it quite correctly, why LibreOffice cannot ?
Comment on attachment 111394 [details] Text file made with scan / OCR process, .doc. fix mimetype
(In reply to FMJ Vezelay from comment #0) > I try to open the .doc (attachment 111394 [details]), with ocr extracts; Working from an OCR source can be always be challenging... > it may be have done > with Microsoft Word. LibreOffice generate many pages, in thin columns and > then it blocks! TESTING on Ubuntu 14.04: In LO 4.4.0.1, I see a large number of pages (~180), and lots of content in thin columns In LO 3.5.7.2, I see ~109 pages, many with content in 2 columns in landscape mode. In both, the document definitely makes LibreOffice run slowly. The layout in 3.5 looks A LOT better, so I'm going to tag this as a regression. Keywords -> regression Whiteboard -> bibisectRequest Status -> NEW
The thin column problem happened in 4.3 as 4.2.6 is fine and it has 92 pages when you open page preview after repagination. Version: 4.2.6.2 Build ID: 185f2ce4dcc34af9bd97dec29e6d42c39557298f
(In reply to Jay Philips from comment #3) > The thin column problem happened in 4.3 as 4.2.6 is fine and it has 92 pages > when you open page preview after repagination. I just tested LO 4.3.2.2, and it's got the same 4 skinny columns problems. Narrowing-down on the problem, but I'm cc'ing the (bi)bisect maestro here to figure out what went wrong. If we work quickly, we can take this bug from opening -> commit identified in about one day ;-)
Bibisect results from 43all and 44: In the course of history this has been broken, fixed and then broken again, as summarised below 43all: Broken at [8a2068ec09e531c6943ef0f090bd02a1cab565b7] source-hash-5218c0d6a8171400bee0d972ff05757849df4d19 43all: Fixed at [251dbe932a666e83c91816fcf755a4c3be51e078] source-hash-fff4d120866a0be3cd8185f2c67bb9f59b1a6a3f 44: Broken at [626531d9052fe067359170d41bd943b59766b551] source-hash-3d3401a6397e893808309ec374f5d8f890144906
The most recent breakage of the attached file seems to have appeared at the below commit Adding a Cc: to l.lunak@collabora.com. Could you shed any light on what's going on with this bug? Thanks commit c5ed52b1cd6f22787c94bec035ceecf9e1da3271 Author: Luboš Luňák <l.lunak@collabora.com> Date: Mon Jul 21 10:56:52 2014 +0200 ww8import create a pagedesc if continuous section changes margins (bnc#875383) This is similar to what writerfilter does. MSWord can have one page with several different margins, which are saved using continuous sections, which causes all kinds of trouble, because either we treat them as Writer sections, which means we lose some of the data, or we treat them as Writer page styles, which causes spurious page breaks if in the wrong place. Either option has its problems, but here it seems slightly better to go for keeping the data and hoping the page break will be in a place where a break will be anyway. Change-Id: I8f52aa820750da6788ea04180a15ac334f6bf87b
Same problem version 4.3.1.1, same commit as Bug 86468. Looks like a dupe.
Migrating Whiteboard tags to Keywords: (bibisected) [NinjaEdit]
*** Bug 95026 has been marked as a duplicate of this bug. ***
Another document affected by the same commit : attachment 42312 [details]
Confirming with: Versie: 5.3.0.0.beta1 Build ID: 690f553ecb3efd19143acbf01f3af4e289e94536 CPU Threads: 4; Versie besturingssysteem:Windows 6.2; UI Render: standaard; Layout Engine: new; Locale: nl-NL (nl_NL); Calc: CL
Created attachment 129206 [details] sample1 another file affected by the same commit where the content inside the frame are shifted to the next page
Created attachment 129207 [details] sample2 another file affected by the same commit
Created attachment 129208 [details] sample3 another one
Created attachment 129209 [details] sample4 another one...
Created attachment 129210 [details] sample5 another one...
another one: rdown_2.doc (attachment 57985 [details]) from bug 46941
*** Bug 105285 has been marked as a duplicate of this bug. ***
*** Bug 110432 has been marked as a duplicate of this bug. ***
Created attachment 135046 [details] smaller sample The first 10 pages of attachment 111394 [details], as opening attachment 111394 [details] hammers the CPU as the page numbers continue to increase.
Still exists in version: 版本:6.1.3.2 (x64) 組建 ID:86daf60bf00efa86ad547e59e09d6bb77c699acb CPU 執行緒:12; OS:Windows 10.0; UI 算繪:GL; 語言地區:zh-TW (zh_TW); Calc: CL
Still exists in version: Version: 6.3.0.0.alpha0+ (x64) Build ID: 0f25a3c36f27fd51453b9a9115f236b83c143684 CPU threads: 12; OS: Windows 10.0; UI render: GL; VCL: win; TinderBox: Win-x86_64@42, Branch:master, Time: 2018-11-27_20:06:55 Locale: zh-TW (zh_TW); UI-Language: en-US Calc: threaded
*** Bug 128605 has been marked as a duplicate of this bug. ***
*** Bug 123337 has been marked as a duplicate of this bug. ***
*** Bug 118927 has been marked as a duplicate of this bug. ***
Even the testcase is incorrect. In LibreOffice it has 2 pages, in MSO Word 2010 it has 1
@Justin Luth, @Miklos, I thought you might be interested in this issue, considering the number of duplicates...
It seems to me, Lubos described the feature that is missing here correctly. Word has this feature that you can attach different page margins to continuous section breaks and then let layout decide which is the first on a given page, and use that for the actual page margin. This is not something Writer has at its core today. Till that is added, I would say the only sane thing to do is to make sure that the DOC, DOCX and RTF behavior is the same.
(In reply to Miklos Vajna from comment #29) > Till that is added, I would say the only sane thing to do is to make sure > that the DOC, DOCX and RTF behavior is the same. Actually, I like having different behaviour in this case. Since we simply can't do what Word does, it is nice to have two different ways to save a compatible-format file. That way on a per-document basis you could choose the one that works better in that case.
(In reply to Robinson Tryon (qubit) from comment #1) > Comment on attachment 111394 [details] > Text file made with scan / OCR process, .doc. This one actually looks reasonable good to me. At least it doesn't have those ridiculously thin columns any more since author Justin Luth on 2019-01-12 17:10:54 with commit 84fefd7c295fc05499ca222dff50c2fe4e0fb27e tdf#120145 ww8import: ignoreCols if section is inserted Otherwise, the column setting is duplicated both in the section and in the page style. But this bug report has become the poster child for all kinds of continuous section break issues in DOC, so I'll keep it open. (Bug 86468 with the same identified commit and lots of duplicates was marked as WONTFIX.)
Dear FMJ Vezelay, To make sure we're focusing on the bugs that affect our users today, LibreOffice QA is asking bug reporters and confirmers to retest open, confirmed bugs which have not been touched for over a year. There have been thousands of bug fixes and commits since anyone checked on this bug report. During that time, it's possible that the bug has been fixed, or the details of the problem have changed. We'd really appreciate your help in getting confirmation that the bug is still present. If you have time, please do the following: Test to see if the bug is still present with the latest version of LibreOffice from https://www.libreoffice.org/download/ If the bug is present, please leave a comment that includes the information from Help - About LibreOffice. If the bug is NOT present, please set the bug's Status field to RESOLVED-WORKSFORME and leave a comment that includes the information from Help - About LibreOffice. Please DO NOT Update the version field Reply via email (please reply directly on the bug tracker) Set the bug's Status field to RESOLVED - FIXED (this status has a particular meaning that is not appropriate in this case) If you want to do more to help you can test to see if your issue is a REGRESSION. To do so: 1. Download and install oldest version of LibreOffice (usually 3.3 unless your bug pertains to a feature added after 3.3) from https://downloadarchive.documentfoundation.org/libreoffice/old/ 2. Test your bug 3. Leave a comment with your results. 4a. If the bug was present with 3.3 - set version to 'inherited from OOo'; 4b. If the bug was not present in 3.3 - add 'regression' to keyword Feel free to come ask questions or to say hello in our QA chat: https://web.libera.chat/?settings=#libreoffice-qa Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team MassPing-UntouchedBug