Bug 139495 - Import of DOC file broken on 64 bit, ok on 32 bit
Summary: Import of DOC file broken on 64 bit, ok on 32 bit
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Justin L
URL:
Whiteboard: target:7.2.0 target:7.1.3
Keywords: filter:doc
Depends on:
Blocks: DOC-Header-Footer
  Show dependency treegraph
 
Reported: 2021-01-08 14:22 UTC by Albrecht Dreß
Modified: 2021-04-11 12:22 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
Sample document, screen shots from 32 and 64 bit Debian Buster (495.34 KB, application/octet-stream)
2021-01-08 14:45 UTC, Albrecht Dreß
Details
139495_bodyAboveHeader.pdf: exposes some fundamental header positioning differences (92.25 KB, application/pdf)
2021-04-07 08:24 UTC, Justin L
Details
139495_bodyAboveHeader.docx: the DOCX version from Word 2016 - which produced the PDF as well. (13.51 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2021-04-07 08:25 UTC, Justin L
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Albrecht Dreß 2021-01-08 14:22:00 UTC
The import of some DOC files, apparently with a very creative use (as to avoid the term “abuse”) of formatting options, leads to a somewhat garbled layout running

| Version: 7.0.4.2
| Build ID: 00(Build:2)
| CPU threads: 2; OS: Linux 4.19; UI render: default; VCL: gtk3
| Locale: de-DE (de_DE.UTF-8); UI: de-DE
| Debian package version: 1:7.0.4_rc2-1~bpo10+2
| Calc: threaded

on a 64-bit Debian Buster (deb's from Debian Backports), and as well as on macOS 10.15 using the official LO 7.0.4 build.

However, the *same* files can usually be loaded with the apparently intended formatting on a 32-bit Debian Buster running exactly the same version of the 32-bit LO build, again from Buster Backports (maybe a glitch in the extension of 32-bit OLE items to 64-bit?).

I could provide

- a sample document,
- a screen shot taken on the 32-bit system showing the intended formatting and
- a screen shot taken on the 64-bit system showing the garbled layout,

but unfortunately it is too large for upload (496K tar.xz).  Please tell me an alternative submission method if the sample would be helpful for fixing this issue.
Comment 1 Xisco Faulí 2021-01-08 14:33:33 UTC
Thank you for reporting the bug. Please attach a sample document, as this makes it easier for us to verify the bug. 
I have set the bug's status to 'NEEDINFO'. Please change it back to 'UNCONFIRMED' once the requested document is provided.
(Please note that the attachment will be public, remove any sensitive information before attaching it. 
See https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F for help on how to do so.)
Comment 2 Albrecht Dreß 2021-01-08 14:45:03 UTC
Created attachment 168765 [details]
Sample document, screen shots from 32 and 64 bit Debian Buster
Comment 3 Albrecht Dreß 2021-01-08 14:47:21 UTC
(In reply to Xisco Faulí from comment #1)
> Thank you for reporting the bug. Please attach a sample document, as this
> makes it easier for us to verify the bug.

Thanks a lot – apparently the attachment limit of 30k doesn't apply…

The tar.xz contains
- the sample DOC file (please ignore the broken German in it…),
- the screen shot running LO 7.0.4 on 32-bit Buster (=looks as intended) and
- the screen shot running LO 7.0.4 on 64-bit Buster (=garbled).
Comment 4 MM 2021-01-08 15:31:13 UTC
Confirmed on mint 20 x64 with Version: 6.4.6.2
Build ID: 1:6.4.6-0ubuntu0.20.04.1
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); UI-Language: en-US
Calc: threaded

and

Version: 7.2.0.0.alpha0+
Build ID: f2171af6ce3516598d9f8bac8294025a21a5b1a2
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-01-08_00:26:19
Calc: threaded


Unconfirmed on windows 10 x64 with Version: 7.0.4.2 (x64)
Build ID: dcf040e67528d9187c66b2379df5ea4407429775
CPU threads: 12; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: en-US (nl_NL); UI: en-US
Calc: CL
Comment 5 Aron Budea 2021-01-10 15:39:03 UTC
Already wrong in 3.3.0.

It's not really helpful to the problem at hand, but there was a change in 4.1, until then only the first page header was imported that large, and afterwards each page starts with an oversized header. The size of the first page header has always been wrong.
https://cgit.freedesktop.org/libreoffice/core/commit/?id=1e113cb7604e1509e7d598a9be329f1f7b6e9322
Comment 6 Justin L 2021-04-06 14:54:53 UTC
Proposed fix at https://gerrit.libreoffice.org/c/core/+/113681
Comment 7 Justin L 2021-04-07 08:24:44 UTC
Created attachment 170995 [details]
139495_bodyAboveHeader.pdf: exposes some fundamental header positioning differences

MS Word allows some very nonsensical settings in designing the page layout, as this document shows. The problem here is that the textbody starts higher than where the header starts (which is the same problem that this bug is reporting - just a bit more clearly demonstrated). LO doesn't allow that (although a compatibility flag since before LO3.5 at least somewhat allows it for DOC format in that the body and header can overlap.)

So, we are emulating the behaviour to some extent, and that can always cause problems in certain documents - which happen to look better one way instead of the other.

Even with my current proposed patch, the body text doesn't start up near the top, but rather where the header starts. So should the header move up to match to body text? (probably - but again, what kind of problems will that cause?)
Comment 8 Justin L 2021-04-07 08:25:46 UTC
Created attachment 170996 [details]
139495_bodyAboveHeader.docx: the DOCX version from Word 2016 - which produced the PDF as well.
Comment 9 Commit Notification 2021-04-09 09:21:25 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/28a9a92105f3155d82fd9e31095efabd3ec706ea

tdf#139495 doc import: prevent negative Int forced into uInt

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2021-04-09 12:48:32 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "libreoffice-7-1":

https://git.libreoffice.org/core/commit/a159d7d2db7a86c770a165b04999f3ab513d3127

tdf#139495 doc import: prevent negative Int forced into uInt

It will be available in 7.1.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 11 Albrecht Dreß 2021-04-10 12:43:17 UTC
I can confirm that the build

| Version: 7.2.0.0.alpha0+ / LibreOffice Community
| Build ID: d214ab444e73490f4c95dffd7f376978cbcd3ccc
| CPU threads: 2; OS: Linux 4.19; UI render: default; VCL: gtk3
| Locale: de-DE (de_DE.UTF-8); UI: en-US
| TinderBox: Linux-rpm_deb-x86_64@86-TDF, Branch:master, Time: 2021-04-09_15:49:45
| Calc: threaded

running on 64-bit Debian Buster now formats the example DOC file properly.  Due to the lack of a 32-bit Linux build I cannot check if it breaks on 32-bit Debian Buster, though.

Thanks a lot for fixing this issue!
Comment 12 Justin L 2021-04-10 16:43:48 UTC
I think if you want to confirm 32bit, you could test on Windows. A related issue indicates that even 64bit windows uses 32bit ints, just like Debian 32bit would.
Comment 13 Albrecht Dreß 2021-04-11 12:22:25 UTC
(In reply to Justin L from comment #12)
> I think if you want to confirm 32bit, you could test on Windows. A related
> issue indicates that even 64bit windows uses 32bit ints, just like Debian
> 32bit would.

Thanks for that hint, but I don't use (and have no access to) Win systems – only to macOS and several 32- and 64-bit Linux boxes…