Bug 50763 - HTML Import doesnt parse inline base64 images correctly
Summary: HTML Import doesnt parse inline base64 images correctly
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.5.4 release
Hardware: Other All
: medium normal
Assignee: Christina Rossmanith
URL:
Whiteboard: target:4.3.0 target:4.2.2 target:4.4.0
Keywords: difficultyBeginner, easyHack, skillCpp
: 77520 90574 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-06-06 01:01 UTC by Björn Michaelsen
Modified: 2016-02-18 16:37 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Björn Michaelsen 2012-06-06 01:01:47 UTC
Inlined HTML images:

 <img src="data:image/jpeg;base64,fdsiid..." .../>

are imported as text. This is an enhancement request to parse such data correctly.

Since the base64-data contains no whitespace, Writer has a hard time figuring out how to break lines, freezing it even on small amounts of such data. So this is more important than a random feature request as it might be responsible/root cause for a lot of "copy-paste from web, then freeze" bugs.

If a full implementation of base64 parsing is out of scope, a workaround should be implemented to at least just skip such data, to prevent the Writer layout going havoc. Consider splitting such a task of as a simpler to fix bug.
Comment 1 Björn Michaelsen 2012-06-06 01:03:14 UTC
@cbosdonnat, mst: Possible EasyHack?
Comment 2 Michael Stahl (allotropia) 2013-04-02 22:10:19 UTC
writer HTML filter is in sw/source/filter/html
with some base classes in svtools/source/svhtml/parhtml.cxx etc.

to patse the base64 itself it's best to use sax::Converter::decodeBase64
from sax/tools/converter.hxx.
Comment 3 Paul Dicker 2013-08-24 13:45:01 UTC
I would like to give this a try.
One thing I can't figure out is, once I have the image data converted from base 64, how do I add it as an image to the document.
Can someone give a hint?
Comment 4 Björn Michaelsen 2013-10-04 18:48:08 UTC
adding LibreOffice developer list as CC to unresolved EasyHacks for better visibility.

see e.g. http://nabble.documentfoundation.org/minutes-of-ESC-call-td4076214.html for details
Comment 5 Cédric Bosdonnat 2014-01-20 08:57:50 UTC Comment hidden (noise)
Comment 6 Thorsten Behrens (allotropia) 2014-02-07 13:06:34 UTC
For reference, there's a work-in-progress patch on gerrit now:

https://gerrit.libreoffice.org/#/c/7773/
Comment 7 Commit Notification 2014-02-14 09:40:05 UTC
Chr. Rossmanith committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=363f1c1462963f6f032de07649dc9c4d02b4e446

fdo#50763: handle inlined base64 images



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 8 Commit Notification 2014-02-14 09:43:27 UTC
Chr. Rossmanith committed a patch related to this issue.
It has been pushed to "libreoffice-4-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=047913ea8f0cb8b03f78be0780c5e828be9ef323&h=libreoffice-4-2

fdo#50763: handle inlined base64 images


It will be available in LibreOffice 4.2.2.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 9 Andras Timar 2014-04-20 12:38:42 UTC
*** Bug 77520 has been marked as a duplicate of this bug. ***
Comment 10 Commit Notification 2014-09-18 13:39:37 UTC
Matuš Kukan committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=72703173066a2db5c977d422ace59d60b998bbfc

HTML import: fix importing of inlined images (related: fdo#50763)



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 11 Commit Notification 2014-09-18 13:55:23 UTC
Matuš Kukan committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1c1fe7afb77e0538cdc4081ee266a7bda80f7b05

HTML import test for image inlined in 'src' attribute (related: fdo#50763)



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 12 Buovjaga 2015-04-17 17:05:24 UTC
*** Bug 90574 has been marked as a duplicate of this bug. ***
Comment 13 Robinson Tryon (qubit) 2015-12-15 22:50:46 UTC
Migrating Whiteboard tags to Keywords: (EasyHack DifficultyBeginner SkillCpp )
[NinjaEdit]
Comment 14 Robinson Tryon (qubit) 2016-02-18 16:37:31 UTC
Remove LibreOffice Dev List from CC on EasyHacks
(curtailing excessive email to list)
[NinjaEdit]