Bug 87601 - HTML to DOCX hangs up if HTML contains "MsoHeader"
Summary: HTML to DOCX hangs up if HTML contains "MsoHeader"
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
4.3.3.2 release
Hardware: x86-64 (AMD64) All
: high major
Assignee: Caolán McNamara
URL:
Whiteboard: target:4.5.0 target:4.3.7 target:4.4....
Keywords: bibisected, bisected, filter:html, regression
Depends on:
Blocks:
 
Reported: 2014-12-22 17:37 UTC by Fran Ontanaya
Modified: 2015-12-17 05:57 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
HTML document fails to convert with a p class="MsoHeader" (3.58 KB, text/html)
2014-12-29 14:49 UTC, Fran Ontanaya
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Fran Ontanaya 2014-12-22 17:37:57 UTC
LibreOffice hangs completely trying to import Microsoft Word export HTML that contains a p class="MsoHeader" element.

The HTML doc has these schemas:

xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:w="urn:schemas-microsoft-com:office:word"
xmlns:m="http://schemas.microsoft.com/office/2006/01/omml"
xmlns="http://www.w3.org/TR/REC-html40"

Simply removing the class from the p element allows LibreOffice to load the document.

(Can't attach the offending document since it's an internal product.)
Comment 1 raal 2014-12-22 18:42:47 UTC
Please attach anonymized document.
How can I eliminate confidential data from a sample document?
https://wiki.documentfoundation.org/QA/FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F
Thank you
Comment 2 Robinson Tryon (qubit) 2014-12-23 21:20:30 UTC
(In reply to raal from comment #1)
> Please attach anonymized document.
> How can I eliminate confidential data from a sample document?
> https://wiki.documentfoundation.org/QA/
> FAQ#How_can_I_eliminate_confidential_data_from_a_sample_document.3F
> Thank you

Status -> NEEDINFO

(Please change the status back to UNCONFIRMED after you upload the sample document!)
Comment 3 Fran Ontanaya 2014-12-29 14:49:19 UTC
Created attachment 111472 [details]
HTML document fails to convert with a p class="MsoHeader"
Comment 4 Robinson Tryon (qubit) 2014-12-29 15:30:41 UTC
TESTING on Ubuntu 14.04 + LO 4.3.5.2

(In reply to Fran Ontanaya from comment #0)
> LibreOffice hangs completely trying to import Microsoft Word export HTML
> that contains a p class="MsoHeader" element.

REPRO steps:
1) Try to open attachment 111472 [details]

RESULT: Hang

I waited for a couple of minutes, but nothing's happening.

CONFIRMED: LibreOffice hangs when trying to import this HTML document

Status -> NEW
Comment 5 raal 2014-12-29 16:47:48 UTC
I can open this file in LO 3.5, Linux -> regression
Comment 6 Rostislav 'R.Yu.' Okulov 2014-12-30 08:10:51 UTC
# bad: [423a84c4f7068853974887d98442bc2a2d0cc91b] source-hash-c15927f20d4727c3b8de68497b6949e72f9e6e9e
git bisect bad 423a84c4f7068853974887d98442bc2a2d0cc91b
# good: [65fd30f5cb4cdd37995a33420ed8273c0a29bf00] source-hash-d6cde02dbce8c28c6af836e2dc1120f8a6ef9932
git bisect good 65fd30f5cb4cdd37995a33420ed8273c0a29bf00
# good: [e02439a3d6297a1f5334fa558ddec5ef4212c574] source-hash-6b8393474974d2af7a2cb3c47b3d5c081b550bdb
git bisect good e02439a3d6297a1f5334fa558ddec5ef4212c574
# good: [4850941efe43ae800be5c76e1102ab80ac2c085d] source-hash-980a6e552502f02f12c15bfb1c9f8e6269499f4b
git bisect good 4850941efe43ae800be5c76e1102ab80ac2c085d
# bad: [a900e72b6357882284c5955bdf939bf14269f5fb] source-hash-dd1050b182260a26a1d0ba6d0ef3a6fecc3f4e07
git bisect bad a900e72b6357882284c5955bdf939bf14269f5fb
# bad: [e1d0365cd2b073a859f59ad0a4584385a66dc611] source-hash-2eea96c702a44ab009743b0d22ef639127f0b57b
git bisect bad e1d0365cd2b073a859f59ad0a4584385a66dc611
# bad: [98a55bf95f3ec29298751fd8fba76dd2236dce43] source-hash-58dfc97ca697875c36b7ddf14f5505a93d7b9cf8
git bisect bad 98a55bf95f3ec29298751fd8fba76dd2236dce43
# bad: [92ca7e7dd4470107453ce3e99f3675387f91bf24] source-hash-ed5065d8b080bfaf51ea1232cebf3ff72af1e640
git bisect bad 92ca7e7dd4470107453ce3e99f3675387f91bf24
# good: [6b545103ded12a7a1e7b490734eb094344a0f3ca] source-hash-76702bc75d79dee09a01c57c68e49efa5664c355
git bisect good 6b545103ded12a7a1e7b490734eb094344a0f3ca
# good: [5023c3e436e8a445b700a81bd4a404673084678a] source-hash-5da974369d01760b336de34e68c03d7268d2d330
git bisect good 5023c3e436e8a445b700a81bd4a404673084678a
# bad: [77962b3d9a08d8c7177b2c67da6ed1c5bc26572c] source-hash-d1ba55a28cd40134356faf3e01971491086591d9
git bisect bad 77962b3d9a08d8c7177b2c67da6ed1c5bc26572c
# bad: [f1e56b0f09e0a75b8970a8b9892298f0ca210200] source-hash-eeeefd6fd87b3cff18ba9078869bdfcd0e351d6f
git bisect bad f1e56b0f09e0a75b8970a8b9892298f0ca210200
# first bad commit: [f1e56b0f09e0a75b8970a8b9892298f0ca210200] source-hash-eeeefd6fd87b3cff18ba9078869bdfcd0e351d6f

 f1e56b0f09e0a75b8970a8b9892298f0ca210200 is the first bad commit
commit f1e56b0f09e0a75b8970a8b9892298f0ca210200
Author: Bjoern Michaelsen <bjoern.michaelsen@canonical.com>
Date:   Sun May 11 02:22:13 2014 +0000

    source-hash-eeeefd6fd87b3cff18ba9078869bdfcd0e351d6f
    
    commit eeeefd6fd87b3cff18ba9078869bdfcd0e351d6f
    Author:     Marcos Paulo de Souza <marcos.souza.org@gmail.com>
    AuthorDate: Tue Jan 14 13:09:33 2014 -0200
    Commit:     Matúš Kukan <matus.kukan@collabora.com>
    CommitDate: Thu Jan 16 10:26:04 2014 +0100
    
        fdo#54938: Convert filter to cppu::supportsService
    
        final part
    
        Change-Id: If9387b4f7aa8ca694092f51eabeac096c71347eb

:100644 100644 3e197ed074c9d37c20514d7ad671bb9b14fc9e43 078d4fe451f07f1d344efdf613645d01e477cad9 M      ccache.log
:100644 100644 8b7fb2264ddbbe0b6ce62df55df3cc17d4094fcb a60d1828ec08b802f939c6ab9ced8d795a09f8ba M      commitmsg
:100644 100644 ec16c05dc4818fa880ea7db9663c1512dc9f2a7b 8e5a1ba748a7ca66a673cdbc5ca39f024406a1e8 M      make.log
:040000 040000 5a80d6cbdd314c2de7ed5961665c1af9f4a4b053 ed375e6fe30120b84a779265ef85d1f35bab1852 M      opt
Comment 7 Matthew Francis 2015-01-11 04:37:33 UTC
The hang seems to have started as of the below commit.

Adding Cc: to caolanm@redhat.com; Could you possibly have a look at this? Thanks


commit ba27366f3d6bc6b209ecd5c5cb79a9ee5315316a
Author: Caolán McNamara <caolanm@redhat.com>
Date:   Tue Jan 14 16:50:42 2014 +0000

    Resolves: #i17171# Writer paragraph cannot be longer than 65534 characters
    
    Change-Id: I2052ae96571cba8fe2191dff53b1c61c95c94c60
Comment 8 Commit Notification 2015-01-21 15:26:43 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=6acd5c45c764d81aea1539e66adbfadb51df0aa3

Resolves: fdo#87601 specific html doc hangs on load

It will be available in 4.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 9 Commit Notification 2015-01-22 14:13:51 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-4-3":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=56e9bf6c459a0cbd8b373047d960cd4d68a526e4&h=libreoffice-4-3

Resolves: fdo#87601 specific html doc hangs on load

It will be available in 4.3.7.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 10 Commit Notification 2015-01-22 14:13:57 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=b27a4cc60f080e24f908e25f28d44c7de2269c29&h=libreoffice-4-4

Resolves: fdo#87601 specific html doc hangs on load

It will be available in 4.4.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2015-01-22 15:33:45 UTC
Caolán McNamara committed a patch related to this issue.
It has been pushed to "libreoffice-4-4-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=629e451f7e1a6c65e61ee8ab305e464688ca5fbf&h=libreoffice-4-4-0

Resolves: fdo#87601 specific html doc hangs on load

It will be available in 4.4.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 12 Robinson Tryon (qubit) 2015-12-17 05:57:44 UTC
Migrating Whiteboard tags to Keywords: (filter:html, bibisected)
[NinjaEdit]