Bug 154020 - DOC with Mac-Picts: images don't import
Summary: DOC with Mac-Picts: images don't import
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.0.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL: https://ask.libreoffice.org/t/dealing...
Whiteboard:
Keywords: bisected, filter:doc, regression
Depends on:
Blocks: EMF-WMF DOC-Images
  Show dependency treegraph
 
Reported: 2023-03-06 17:54 UTC by Mike Kaganski
Modified: 2023-12-18 14:15 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
A sample DOC with Mac-Picts (841.00 KB, application/msword)
2023-03-06 17:54 UTC, Mike Kaganski
Details
The original file, re-saved by Word 97 (525.00 KB, application/msword)
2023-03-06 17:54 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2023-03-06 17:54:18 UTC
Created attachment 185798 [details]
A sample DOC with Mac-Picts

The attached DOC contains ten images. Prior to version 6.0, four of those were imported correctly, and the rest were lost. Since 6.0, all ten import as a cross with a message "Use Word 6.0c or later to view Macintosh picture."

This happened in two stages. First, in the range https://git.libreoffice.org/core/+log/c7470f5be441d8fe80155ff29605d74d5838be26%5E..208e66185b634ebc131121158f93f4f3ae4bd18e, all the images got lost completely. Then in https://git.libreoffice.org/core/+/b5f2402e023fb438341895ad0f81020571c5ec5a it started to import as it does now.

The problem is in SwWW8ImplReader::ReadGrafFile, which expects ReadWindowMetafile to increment the stream's position by the size of the metafile, and *then* read the Mac-Pict. But the ReadWindowMetafile now restores the original position, which breaks the following SwWW8ImplReader::GetPictGrafFromStream operation.

Additionally, the ReadWindowMetafile now reads *everything* from the stream's current position till the end. This results in the metafile containing the garbage in the end (actually, the rest of the DOC); and this garbage is kept when writing. So this command:

soffice --convert-to ODT "Conspiracy Theories.doc"

would create a file larger than 3 MB, while the original DOC is only 841 KB.

For comparison, the DOC re-saved in Word 97 to contain "normal" images, which is 525 KB (will attach it in the next message), gets converted to a 522-KB ODT.

A different issue is the inability to read the last six images (in older versions), the reference for which are, again, in the next attachment.
Comment 1 Mike Kaganski 2023-03-06 17:54:58 UTC
Created attachment 185799 [details]
The original file, re-saved by Word 97
Comment 2 Roman Kuznetsov 2023-03-20 20:29:32 UTC
confirm the problem in

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: b5c3a7502f7ff6ccf0f829c1f3a2ba50b8584c41
CPU threads: 16; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: ru-RU (ru_RU); UI: ru-RU
Calc: CL threaded
Comment 3 bunkem 2023-08-12 20:05:24 UTC
Tried to open the two files in 
Version: 7.4.7.2 / LibreOffice Community
Build ID: 723314e595e8007d3cf785c16538505a1c878ca5
CPU threads: 4; OS: Mac OS X 10.13.6; UI render: default; VCL: osx
Locale: en-CA (en_CA.UTF-8); UI: en-US
Calc: threaded

"Conspiracy Theories.doc" only opens with 4 pictures. There are no bounding boxes for the other pictures.

"Conspiracy Theories - resavedByW97.doc" opens all 10 pictures OK.

So it appears that the first file "Conspiracy Theories.doc" is defective in some way.

Since the second document "Conspiracy Theories - resavedByW97.doc" opens and shows all pictures, I'd say this "works for me" and is no longer a bug.
Comment 4 bunkem 2023-08-12 20:16:39 UTC
Tested with 
Version: 7.6.0.0.beta1+ (X86_64) / LibreOffice Community
Build ID: 132b4d1c7b8b9bb55e4e254a1a0b53f669c94975
CPU threads: 8; OS: Mac OS X 12.6.8; UI render: Skia/Metal; VCL: osx
Locale: en-CA (en_CA.UTF-8); UI: en-US
Calc: threaded

"Conspiracy Theories.doc" only opens with bounding boxes and no pictures.

"Conspiracy Theories - resavedByW97.doc" opens all 10 pictures OK.
Comment 5 Mike Kaganski 2023-08-12 20:43:40 UTC
(In reply to bunkem from comment #3)
> So it appears that the first file "Conspiracy Theories.doc" is defective in
> some way.

I already explained the problem. The file is not defective, the import is (in many ways).
Comment 6 bunkem 2023-08-14 13:32:08 UTC
(In reply to Mike Kaganski from comment #5)
> (In reply to bunkem from comment #3)
> > So it appears that the first file "Conspiracy Theories.doc" is defective in
> > some way.
> 
> I already explained the problem. The file is not defective, the import is
> (in many ways).

Hi @Mike,

Thank you.  You have been quite detailed. I have a few more questions that would help me to test.

From your notes, the second file was created using Word 97 "save as" but using the first file as the basis.  Is this correct?  I presume this was on Windows? Which version of Windows?

How did you create the first file?  What software (Word?), software version (Word 97?) and OS (Windows 98/7/10?)?  

The image files are Mac-Pict. How were they created and using which software on which OS?

Thanks in advance.

B.
Comment 7 Mike Kaganski 2023-08-14 14:04:30 UTC
(In reply to bunkem from comment #6)
> From your notes, the second file was created using Word 97 "save as" but
> using the first file as the basis.  Is this correct?  I presume this was on
> Windows? Which version of Windows?

Windows XP VM.

> How did you create the first file?  What software (Word?), software version
> (Word 97?) and OS (Windows 98/7/10?)?  

See the URL meta-field. It points to the Ask topic, from where the sample is. As the OP wrote there, "These are all older files, created with MS Word 6 on a Macintosh (can’t remember the system)".

> The image files are Mac-Pict. How were they created and using which software
> on which OS?

This was what MS Word for Mac used to create back then with the embedded images.
Comment 8 bunkem 2023-08-15 17:33:25 UTC
Thanks @Mike.  The additional info is very helpful.

I'm old enough to remember MS Word 6 for Mac and still have the install discs. Oh my was that software a piece of s**t. I got on the first name basis with many MS support staff due to all the problems.  :LOL:  I remember there were numerous graphics issues with Word 6 but can't remember if this happened when the software converted the PICT images to Microsoft BMP or EMF images. 

@ilmari, we need your input.  I can confirm the issue and being able to open old doc files created on Word Mac should work. 

I'm not sure if there is a time line on how long old files need to be usable or convertible using the current versions of LO.  I can't test further since I don't have old enough hardware to run Word 6 Mac and Apple/Mac PICT files have been official unsupported for at least a decade.

Should this be pushed to Document Liberation project that deals with file conversions??
Comment 9 Alex Thurgood 2023-08-23 09:10:07 UTC
(In reply to bunkem from comment #8)

Just my 2c on this.

LO still supports direct insertion of PICT files into a Draw document. I haven't checked Impress, but presumably the same range of image formats are supported ?

LO Draw supports a number of image formats that might be considered "unsupported" by the official organisation behind the format, so removing the support for one arbitrarily because the corresponding Word filter can not handle it should be weighed up very carefully as to the impact and perception of such a decision on the suite as a whole.
Comment 10 Mike Kaganski 2023-08-23 09:12:34 UTC
Note that there was no removal of support. There was a regression, caused by a change that targeted something else. It does not need any discussion, only a fix.