Bug 86026 - FILEOPEN: Cannot extract or open files OLE-embedded within a .doc(x) when they are from a non-Office type
Summary: FILEOPEN: Cannot extract or open files OLE-embedded within a .doc(x) when the...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.2.7.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, bisected, filter:doc, filter:docx, regression
Depends on:
Blocks: OLE-Objects
  Show dependency treegraph
 
Reported: 2014-11-08 00:27 UTC by Adrien Demarez
Modified: 2020-10-22 16:39 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrien Demarez 2014-11-08 00:27:11 UTC
LibreOffice cannot properly open or extract documents embedded within a .doc(x) container when they should be open by another program than LibreOffice. A good example is to try to open (or export) the .zip / .pdf attachments from this document
http://cept.org/Documents/ecc-pt1/14999/ECC-PT1(14)014_rev1_Results-of-Public-Consultation-of-the-draft-amended-ECCDEC(11)06
Comment 1 A (Andy) 2014-11-08 12:18:07 UTC
Which specific issue do you mean?  Do you mean the layout of the doc/docx documents is wrong?  Or do you only mean that the pdf/zip documents are not opened?  
For me the pdf document is opened in the pdf reader as expected.

(tested with LO 4.3.3.2, Win 8.1)
Comment 2 Adrien Demarez 2014-11-08 13:57:30 UTC
I'm using Linux (XUbuntu 14.04 in my case, but I could reproduce the issue on other distributions).

The issue is that the pdf/zip documents are not opened (double-clicking on the item does not open any external program to open the file, and there is no submenu entry in LibreOffice that would allow to 'extract/export' the embedded file in order to open it manually)
Comment 3 A (Andy) 2014-11-08 14:58:11 UTC
Thank you very much for your fast reply.
I can open the pdf document with a pdf reader.  Maybe, this is a Linux only issue?  Can anybody else confirm this bug?
Comment 4 V Stuart Foote 2014-11-08 21:01:47 UTC
Setting, NEW and All but really only seeing issues with the Zip OLE objects.

On Windows 7 sp1, 64-bit en-US with
Version: 4.3.3.2
Build ID: 9bb7eadab57b6755b1265afa86e04bf45fbfc644

or with a current build of Master

The OLE stream for the linked Word .DOC(s) will open with Writer, and the PDF actually does open with the associated helper file--can change helper from Adobe Reader to GSView. 

Also, the .ZIP archive OLE objects are being extracted on document open into the users env variable defined TMP directory. 

But, while the Zip archives are available to manipulate with an archive or folder manager, they are not being opened from within the LibreOffice session.

Instead a double click on the OLE object's frame launches a dialog "Create Package" which looks to belong to the MS Object Packager2 packager.dll.mui, so that seems like a command name collision with the LibreOffice package2 module--http://opengrok.libreoffice.org/xref/core/package/util/package2.component

=-=

On a Linux--Fedora 20 32-bit LXDE and LO 4.2.6.3 the PCManFM file manager does not seem to correctly associate a file type with the OLE stream held as .tmp files written to user /tmp for the Writer session.  The Word .DOCs can be opened within the Writer session. And the PDF files can be opened out of the /tmp .tmp file with Evince--but not by a double click in Writer.  But the OLE stream for the .Zip file is not recognized.


=-=

Get the feeling that handling of the OLE embedded or linked objects is very dependent on the mix of OS, DE and configured file management for each type. Would lean toward this being an enhancement of some sort. I'm just not that up on what should and shouldn't work.
Comment 5 Gary Gruda 2015-01-19 20:07:06 UTC
Screen Shot 2015-01-19 at 1.31.43 PM.png

Keep getting this message when trying to open downloaded templates. I've uninstalled Java 8, reinstalled the same, quit the browser (Safari), quit LibreOffice, restarted everything, and still get the same error when I know the download and installation were done according to instuctions.
Comment 6 Matthew Francis 2015-04-23 06:20:20 UTC
This used to work *slightly* better - the result I get on Linux for e.g. "Annex 2" is that before (1), somehow Firefox gets invoked and offers to save a temporary file which LO has exported, though I don't seem to have anything able to open the result.
(Is this really a Visio file? among the output from master when trying to open the object is "VisioDocument: version 0")

Between (1) and (2) we get a Writer window displaying binary rubbish.
After (2) we get the current result - i.e. double clicking doesn't visibly achieve anything.

Adding Cc: to mstahl@redhat.com, tml@iki.fi; Any further ideas about what's going on here?


(1)
    commit e62339f856efa0b8ef03df3bf8b93e098c4ac0d3
    Author: Michael Stahl <mstahl@redhat.com>
    Date:   Mon Feb 10 16:45:27 2014 +0100

        fdo#73363: sd: fix mis-detection of Visio files as PPT
    
        SdFilterDetect::detect() erroneously detects all binary MSO files, and
        because the Visio types would be checked after PPT, Visio is pre-empted.
    
        Change-Id: I6ec3647a508dc8d79b47bfff6de35ccae39416ee

(2)
    commit 46ad54725bf28ea75278eb63dbf95c4a29618c1c
    Author:     Tor Lillqvist <tml@collabora.com>
    AuthorDate: Wed Aug 27 14:29:43 2014 +0300
    Commit:     Tor Lillqvist <tml@collabora.com>
    CommitDate: Wed Aug 27 15:08:58 2014 +0300
    
        bnc#648251: Avoid crash when attempting to open embedded OLE object as "text"
    
        On non-Windows, when double-clicking an embedded OLE object, our glorious
        content type detection logic detects it as "Text". As a side-effect, we start
        to calculate text statistics on it. Which surely could produce interesting
        numbers (you know what they say about statistics), but sadly causes a crash
        involving the ICU RuleBasedBreakIterator, SwScanner,
        sw::DocumentStatisticsManager and whatnot.
    
        Avoid this by checking for a detected filter of type "Text" explicitly, and
        avoiding the fun code paths in that case.
    
        This leads to double-clicks being just ignored. Maybe it would be more useful
        to produce a "General OLE Error" message box?
    
        Change-Id: Iae0726b5e9c511a92bdff7229d2978cbf76cb07b
Comment 7 Adrien Demarez 2015-05-01 11:46:17 UTC
(In reply to Matthew Francis from comment #6)
> This used to work *slightly* better - the result I get on Linux for e.g.
> "Annex 2" is that before (1), somehow Firefox gets invoked and offers to
> save a temporary file which LO has exported, though I don't seem to have
> anything able to open the result.
> (Is this really a Visio file? among the output from master when trying to
> open the object is "VisioDocument: version 0")

Just to be sure I do not misunderstand : does your question "is this really a visio file" apply to my example "http://cept.org/Documents/ecc-pt1/14999/ECC-PT1(14)014_rev1_Results-of-Public-Consultation-of-the-draft-amended-ECCDEC(11)06" ?

Annex 2 is a ZIP file i.e. not a Visio file. There is no Visio file attached in this document (only MS-word, PDF and ZIP. For example Annex 1 is MS-Word, Annex 4 is PDF).

N.B. If it is a too big work to handle those OLE-embedded documents on Linux, maybe at least it would be easy to implement a right-click menu entry "extract this embedded file" so that it can be extracted and then opened manually ?

> Between (1) and (2) we get a Writer window displaying binary rubbish.
> After (2) we get the current result - i.e. double clicking doesn't visibly
> achieve anything.
> 
> Adding Cc: to mstahl@redhat.com, tml@iki.fi; Any further ideas about what's
> going on here?
> 
> 
> (1)
>     commit e62339f856efa0b8ef03df3bf8b93e098c4ac0d3
>     Author: Michael Stahl <mstahl@redhat.com>
>     Date:   Mon Feb 10 16:45:27 2014 +0100
> 
>         fdo#73363: sd: fix mis-detection of Visio files as PPT
>     
>         SdFilterDetect::detect() erroneously detects all binary MSO files,
> and
>         because the Visio types would be checked after PPT, Visio is
> pre-empted.
>     
>         Change-Id: I6ec3647a508dc8d79b47bfff6de35ccae39416ee
> 
> (2)
>     commit 46ad54725bf28ea75278eb63dbf95c4a29618c1c
>     Author:     Tor Lillqvist <tml@collabora.com>
>     AuthorDate: Wed Aug 27 14:29:43 2014 +0300
>     Commit:     Tor Lillqvist <tml@collabora.com>
>     CommitDate: Wed Aug 27 15:08:58 2014 +0300
>     
>         bnc#648251: Avoid crash when attempting to open embedded OLE object
> as "text"
>     
>         On non-Windows, when double-clicking an embedded OLE object, our
> glorious
>         content type detection logic detects it as "Text". As a side-effect,
> we start
>         to calculate text statistics on it. Which surely could produce
> interesting
>         numbers (you know what they say about statistics), but sadly causes
> a crash
>         involving the ICU RuleBasedBreakIterator, SwScanner,
>         sw::DocumentStatisticsManager and whatnot.
>     
>         Avoid this by checking for a detected filter of type "Text"
> explicitly, and
>         avoiding the fun code paths in that case.
>     
>         This leads to double-clicks being just ignored. Maybe it would be
> more useful
>         to produce a "General OLE Error" message box?
>     
>         Change-Id: Iae0726b5e9c511a92bdff7229d2978cbf76cb07b
Comment 8 Robinson Tryon (qubit) 2015-12-13 11:10:55 UTC Comment hidden (obsolete)
Comment 9 Xisco Faulí 2017-09-29 08:52:04 UTC Comment hidden (obsolete)
Comment 10 QA Administrators 2019-12-03 14:25:31 UTC Comment hidden (obsolete)
Comment 11 Adrien Demarez 2020-10-22 16:39:03 UTC
Dear all,

I tested recently on Linux KUbuntu 20.04 + Libreoffice 6.0.7.3 and on Manjaro/XFCE + Libreoffice 7.0.1.2, and for both cases the bug seems resolved (I don't know when it was fixed and I didn't test with any previous version or distribution. I also cannot test on MacOS as I no longer have access to a Mac).
Thanks !