Bug 96840 - FILEOPEN: not open file in format .DOC with images
Summary: FILEOPEN: not open file in format .DOC with images
Status: REOPENED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:doc
: 111341 (view as bug list)
Depends on:
Blocks: DOC-Opening CPU-AT-100%
  Show dependency treegraph
 
Reported: 2015-12-31 09:02 UTC by Roman Kuznetsov
Modified: 2021-11-09 17:27 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:
Regression By:


Attachments
File DOC (11.77 MB, application/msword)
2019-02-10 07:41 UTC, Roman Kuznetsov
Details
File DOC minimized in MSO (1.67 MB, application/msword)
2019-10-01 07:54 UTC, Timur
Details
File DOCX minimized in MSO (1.59 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-10-01 07:55 UTC, Timur
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Roman Kuznetsov 2015-12-31 09:02:25 UTC
Version: 5.0.4.2
Build ID: 2b9802c1994aa0b7dc6079e128979269cf95bc78
Locale: ru-RU (ru_RU)
OS: Windows 7 HB x86-64

link to file https://yadi.sk/d/nl-2rXURmcQ7s

many images in a document

in MSO 2007 this file opens correctly. if in MSO save this file in format .docx, then LO opens it, but number of pages measured in thousands
Comment 1 MM 2015-12-31 12:05:59 UTC
Confirmed with v5.1.0.1 under ubuntu 14.04 x64.
Also doesn't open with older versions.
Comment 2 Valek Filippov 2016-01-02 23:48:56 UTC
The file seems to include two JPEGs and one PNG wrapped into zipped PPTX each ("Package" streams).
Comment 3 Roman Kuznetsov 2017-06-15 10:07:27 UTC Comment hidden (obsolete)
Comment 4 Roman Kuznetsov 2017-11-26 20:46:28 UTC Comment hidden (obsolete)
Comment 5 Roman Kuznetsov 2018-04-27 19:54:31 UTC Comment hidden (obsolete)
Comment 6 Roman Kuznetsov 2019-02-10 07:41:30 UTC
Created attachment 149065 [details]
File DOC
Comment 7 Roman Kuznetsov 2019-02-10 07:45:18 UTC Comment hidden (obsolete)
Comment 8 Roman Kuznetsov 2019-09-27 19:46:38 UTC Comment hidden (obsolete)
Comment 9 Timur 2019-10-01 07:54:17 UTC
Created attachment 154662 [details]
File DOC minimized in MSO

Original DOC has 30 pages. It's always better to have minimized sample. 
Here is DOC created in MSO, just pages 5-7 with images, without footer, still no open.
Comment 10 Timur 2019-10-01 07:55:47 UTC
Created attachment 154663 [details]
File DOCX minimized in MSO

Let me also add DOCX pages 5-7 that has page counting on fileopen. 
There are other bugs for that so no need to open separately, let's wait for this one and watch others. I added one of them to See Also.
Comment 11 Xisco Faulí 2019-10-03 09:42:59 UTC
Inherit from OOo + no dupes, I don't see the reason why it's a high severity bug
Comment 12 Roman Kuznetsov 2020-01-10 14:34:08 UTC Comment hidden (obsolete)
Comment 13 Roman Kuznetsov 2020-08-03 17:07:28 UTC
still repro in

Version: 7.1.0.0.alpha0+
Build ID: <buildversion>
CPU threads: 4; OS: Mac OS X 10.15.5; UI render: default; VCL: osx
Locale: ru-RU (ru_RU.UTF-8); UI: en-US
Calc: threaded
Comment 14 stragu 2021-06-08 14:17:44 UTC
I can open:
- attachment 149065 [details] (original DOC)
- attachment 154662 [details] (Timur's smaller DOC)

With version:

Version: 7.2.0.0.alpha1+ / LibreOffice Community
Build ID: 399a6472f666ae6c3e20b6f8367f9fd089c15605
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-06-05_17:38:40
Calc: threaded

However, DOCX version in attachment 154663 [details] has the same issue as previously described: keeps increasing the number of pages.

Given that the title of this report is about the DOC not opening, can we close this as a WORKSFORME and open a new bug for the DOCX page increase?
Comment 15 Roman Kuznetsov 2021-06-08 14:33:37 UTC
(In reply to stragu from comment #14)
> However, DOCX version in attachment 154663 [details] has the same issue as
> previously described: keeps increasing the number of pages.
> 
> Given that the title of this report is about the DOC not opening, can we
> close this as a WORKSFORME and open a new bug for the DOCX page increase?

I think yes. 

And would be interesting to know which commit fix it. I'll (possibly) do a revert bisect
I'll create a different report for DOCX file

Thanks for retesting Stéphane
Comment 16 Timur 2021-06-09 07:43:06 UTC
Reverse with 7.2+:
commit ba698a8561700f503cdd7a5cb0bc83d6eaf4222b is GOOD
Date:   Fri May 21 08:16:39 2021 +0200
    source sha:798b69087119c01a3b51e0bb3240ef35cfededeb
    previous sha:fb5247bf587518eaa01cf5d54dceddf73827d740

author	Daniel Arato (NISZ) <arato.daniel@nisz.hu>	2021-03-24 20:18:16 +0100
committer	László Németh <nemeth@numbertext.org>	2021-05-21 08:00:33 +0200
commit 798b69087119c01a3b51e0bb3240ef35cfededeb (patch)
tree a498de98b34bacf760b208d3cc650c2a2f68ae58
parent fb5247bf587518eaa01cf5d54dceddf73827d740 (diff)
tdf#104254 sw DOCX import: fix text wrapping in headers
Text wrapping around shapes and images used to be
turned off in header and footer frames. This commit
simply reenables that feature for headers/footers
(to avoid also regressions related to the fix i13832).

OK, great...but how did DOCX fix fixed DOC here?
Comment 17 Timur 2021-06-09 07:50:03 UTC Comment hidden (obsolete)
Comment 18 Xisco Faulí 2021-06-09 07:50:44 UTC
The fix is in sw/source/core/text/txtfly.cxx which seems to be code shared by docx and doc formats, maybe others as well
Comment 19 Dániel Arató (NISZ) 2021-06-09 08:10:40 UTC
(In reply to Timur from comment #17)
> Hi Daniel, seems you fixed this DOC bug, could you look and explain? Thanks.

Pretty much what Xisco said imho. The other ticket (tdf#104254) was about DOCX files specifically, but the change may have affected other subsystems and solved other issues by chance. Looks like we just got lucky on this one.
Comment 20 Commit Notification 2021-06-09 09:30:55 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/975488594fc88aaba7298448e0ff727ebca7fe85

tdf#96840: sw_ww8export3: Add unittest

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Timur 2021-06-09 10:27:56 UTC
*** Bug 111341 has been marked as a duplicate of this bug. ***
Comment 22 Timur 2021-06-09 13:55:15 UTC
*** Bug 76219 has been marked as a duplicate of this bug. ***
Comment 23 Justin L 2021-08-04 05:34:47 UTC
(In reply to Xisco Faulí from comment #18)
> The fix is in sw/source/core/text/txtfly.cxx which seems to be code shared
> by docx and doc formats, maybe others as well

This is in writer core, so it is irrespective of any formats. Thus it has been reverted - because it breaks the point of USE_FORMER_TEXT_WRAPPING.

The change of course can affect layout. In some documents, it resolved layout loops, and caused loops in others. These just expose or hide existing problems, so this bug was never "fixed" and is now "broken" again. REOPEN (but at least we have a clue as to what to look for - likely a page available height issue.)