Created attachment 60266 [details]
When you export a document that contains images to a HTML file it embeds the images into the html. That is, it embeds the binary data.
I believe that images should instead be saved in a folder with the same name of the document and instead be link to those files. Embedding the images into the html file is a bad idea for two reasons:
* It results in the browser having to load a much larger file
* It actually results in an overall larger file. In the attachments. In the attachment the 4x4px gif (39 bytes), when embedded, results in an html file that is 86 bytes. When the image is instead link the combine file size of the html file and gif is 61 bytes. 25 bytes is wasted, which will only grow larger as the images used get larger
embedding the images in the html should probably be optional.
Created attachment 66396 [details]
resulting html file and original file
Please also see the attached html file generated with
/opt/libreoffice3.6/program/soffice --display :1020 --convert-to html --outdir ./ Untitled\ 2.odt
The embedded image is not show by any browser.
I forgot to mention that this was tested in fedora with 3.6.1
Created attachment 90351 [details]
ODT and example X/HTML output under v3304 and v4132.
I think this bug can be RESOLVED as NOTABUG. At the very least this bug would need to be changed to an enhancement request as it is asking for a new option that has never existed. There are separate filters for converting a file opened in Writer to HTML in such a way as to embed or link any included graphics:
- File > Export > select File Type of "XHTML" will embed the graphics.
- File > Save As... > select File Type of "HTML Document" will link the graphics.
These two filters can be used via the command line as:
$ soffice --headless --convert-to html:"HTML (StarWriter)" file_to_convert.odt
$ soffice --headless --convert-to html:"XHTML Writer File" file_to_convert.odt
I have tested this functionality under Ubuntu 10.04 x86_64 running:
- v126.96.36.199 OOO330m19 Build: 6
- v188.8.131.52 Build ID: 70feb7d99726f064edab4605a8ab840c50ec57a
... indicating it has always been this way. The problem indicated by comment #2 is a completely separate issue that is raised (using the same attachment) in bug 54315.
Changed Version to Inherited From OOo as a result of comment #4.
(In reply to comment #4)
> - File > Export > select File Type of "XHTML" will embed the graphics.
> - File > Save As... > select File Type of "HTML Document" will link the
It would appear that since LO v4.2 the fix to bug 63211 has made this situation worse as now it is impossible to create a HTML file from any of these sources:
- ODT with embedded graphic.
- ODT with linked graphic.
- HTML with embedded graphic.
- HTML with linked graphic.
... and produce HTML output (via either quoted method) with a linked graphic. All graphics are written out as base64 embedded. Unsure whether to mark this now as a regression, since the v4.2 change.
I agree to Antonio and Owen Genat. It is not any longer possible to create simple html with pictures linked in 4.2
Furthermore, and much important, html files with images saved in lo 4.2 writer don't open anymore in lo. lo crashes if you try to do.
That means, you can open them in a browser (slowly) but if you wan't to change the code of the html file you must use another text editor.
For my opinion, it should be restored as it was until 4.1 and as Owen Genat says:
- File > Export > select File Type of "XHTML" will embed the graphics.
- File > Save As... > select File Type of "HTML Document" will link the graphics.
It is a regression and users complain about it.
This one is a killer. It makes using LOweb useless as a simple wysiwyg html editor. Any web page with images is corrupted with filesize expanded by nearly an order of magnitude. That often results in web pages that cause the infamous 'read error' on the image showing pages of 'code' that text and html code editors often complain is a line too long for editing.
The fact that there is no obvious work around leaves me high and dry with no options. I finally got a client to upgrade - and boy did I get blown out of the water on this bug.
Please disregard my comment 4 and comment 5. I am setting the version back to the earliest 4.2 release due to comment 6 through comment 8. Summary amended for clarity. I am also setting the Severity to major, although it would appear, according to the Bug Triage flowchart (https://wiki.documentfoundation.org/images/0/06/Prioritizing_Bugs_Flowchart.jpg), that it may be a blocker. I will leave it to QA to confirm this.
Pulled from my bug 79730 comment 3:
Yes since 4.2, images added to files are saved as base64 encoded images embedded into html files, unless you check the 'insert as link' checkbox in the file insert dialog, but for some reason this isnt happening.
Confirmed in Linux Mint in 4.2.4, 4.2.6 and 4.3 beta. It does work correctly with referenced images in .odt files but not .html files.
*** Bug 79730 has been marked as a duplicate of this bug. ***
(In reply to Regina Henschel from comment #8)
> It is a regression and users complain about it.
FWIW, a quick note for the next time: if this is about a behaviour change in 4.2 it would have likely be better to open a clear new bug, not reuse an old one from 2012.
(This is an automated message.)
It seems that the commit that caused this regression was identified. (Or at least a commit is suspected as the offending one.)
Thus setting keyword "bisected".
I'm not sure if this is the correct place to comment, but if this is added as an option, it would be wonderful if it were possible to control it from the command line. I use libreoffice with --headless --convert-to to automate conversion of some files, so a fix that only involves the UI would not solve this problem.
Adding bibisected to whiteboard as bisected keyword is a subset of bibisected whiteboard.
*** Bug 80973 has been marked as a duplicate of this bug. ***
AFAICS, in current versions, at least 4.3.x it's not possible to use a filter which doesn't embed the raw images.
Quoting myself from #88038:
The data URI shouldn't be used per default.
- It's definitely not the standard in the web.
- Supporting it is not mandatory as per the HTML standards. Many browsers don't support it.
- It's really a crude hack anyway, as it blows up the data size extremely.
Moreover, it breaks any usage scenarios of e.g. using the --convert-to functionality within git for diffs of git managed opendocument files.
It's really disturbing that such functionalities are removed in the first place (I thought LibreOffice != GNOME and features would be added, not dropped)... and even more that this is left open for such a long time o.O
All the same. I can't use LibreOffice because of this bug. For now I start looking for any other Office which can store images separately from an HTML index file...
Returning version number back to it earliest affected version. With this issue, LO is unusable to save to html or create html in Writer Web.
*** Bug 80891 has been marked as a duplicate of this bug. ***
I was also able to succesfully open the attached file
(in 'Example files.zip', named 'embedded.html')
, which was posted with the bug in LO Writer 184.108.40.206
I think this bug can be closed?
This was caused somewhat unintentionally resolving mail merge features of bug 63211 -- "MAILMERGE: looses embedded image when e-mailing as html"
The UX and design side has it under consideration as bug 88038 -- "WRITER WEB: handling of inserted images -- data URI embedding as base64 for email support, broke LO as HTML editor"
But, point remains that HTML editing in Writer Web is misbehaving because images are being embedded into the HTML as base64 images by default when saved as HTML (an export filter). Ignoring/overriding settings to link the image.
Good for email merge--bad for Writer as an HTML editor.
1. create File -> New -> HTML document to launch Writer Web session
2. type some text
3. insert an image
4. verify image is available as a link (Edit -> Links)
5. save close
7. image not available as a link (Edit -> Links), rather is embedded into the HTML as an <img> stanza with base64 encoding.
Presently, no way to do anything except embed. Specifically can not link by reference.
@Ciorba Edmond, could you perhaps revisit your work on bug 63211 -- where the export to HTML filter actions are now always converting images, without regard for Writer's dual role as an HTML editor. In that mode, Writer does better with a document without embedding the images and so can honor the links.
Remains a nuisance into 5.0.0rc2 and current master.
In my own experience and as IT support, using Writer as HTML editor is a much more common use case than to mail merging HTML email. If there is not a fix planned for this, we should consider reverting
Vasily has been working on the HTML exporter.
FWIW, some of us do want images embedded when we export to html.
Can this be optional?
(In reply to Michael Adams from comment #28)
Please do not adjust details of bug reports, this first occurs in 220.127.116.11--not the current 18.104.22.168 release.
In reply to Michael Adams:
If you want to RETAIN the embedded image in your document (as distinct from=
merely including the link to the image), then:
1. From the menu, select "Insert > Image > From File"
2. DON'T click the "link" tick-box at the bottom left of the "Insert Image"=
The problem here is that (at least in my Version: 22.214.171.124) the embedded image is retained even if you click the "link" tick-box).
(In reply to Luke from comment #26)
> In my own experience and as IT support, using Writer as HTML editor is a
> much more common use case than to mail merging HTML email. If there is not a
> fix planned for this, we should consider reverting
I do like the export of images in HTML mail...
An option would be great, yes.
(In reply to Cor Nouws from comment #31)
> I do like the export of images in HTML mail...
> An option would be great, yes.
Well sure but the real problem, since default embedding of base64 image into HTML to support for mail merge was implemented, is that we have broken our ability to edit the resulting HTML in Writer Web mode. That default needs to be adjusted--an option on save, or detecting use for mail merge/mail send and only do it then.
See the UX-advise bug 88038
(In reply to Leon Arundell from comment #30)
> In reply to Michael Adams:
> If you want to RETAIN the embedded image in your document (as distinct from=
> merely including the link to the image), then:
> 1. From the menu, select "Insert > Image > From File"
> 2. DON'T click the "link" tick-box at the bottom left of the "Insert Image"=
> dialogue box.
> The problem here is that (at least in my Version: 126.96.36.199) the embedded
> image is retained even if you click the "link" tick-box).
I'm not 100% sure those two things should be tied together. If you email a file or otherwise send it, you need to ensure the images are embedded. On the other end though, it'd be nice to be able to make an HTML file without embedded images without too much fuss.
(In reply to David N. Welton from comment #33)
> I'm not 100% sure those two things should be tied together.
They absolutely should NOT. Imagine if by changing the type of link, it changed the export format from doc to docx without notifying the user at save time. This is essentially what Leon is suggesting.
These are 2 different types of export formats, standard HTML and non-standard HTML with Data URIs, which is not supported by all browsers. Because Data URIs are non-standard and only useful for email (a very rare use case since all mail clients handle this), if we're going to support them it should be an option at export time. We should never use this format unless the user explicitly requests it, never implicitly tied to the document structure.
Until we have the resources to add an option for Data URIs, we should revert commit 5dd1b3da.
I'm not sure whether David N. Welton (comment 33) and Luke (comment 34) agree or disagree with me.
I argue that users should have the OPTION of saving an HTML file with images that are linked (and external to the html file), as distinct from embedded in the html file.
My current versions of LO Writer (188.8.131.52 for Ubuntu and 184.108.40.206 fr Windows) do NOT provide this option, even though they offer a non-functional "Link" tick-box when inserting an image from a file.
I cannot use 220.127.116.11 to edit an html file with an image greater than about 50k, because once Writer has embedded the image and reopened the file, it displays the image as thousands of ACSII characters rather than as an image. Thankfully this problem has been fixed in 18.104.22.168.
We all agree with you. With 4.2, we can not longer properly edit HTML files. 4.2 generates non-standard HTML that many browsers do not support. It also changes the structure of the file without the user explicitly asking for it. Finally it introduces bugs like you have encountered.
The issue you described is related but needs to be tracked separately. Have you filed a bug report? We should never silently corrupt documents on save. We either need to fix the size restriction or warn users that the images are too large to be embedded.
A Writer document saved as HTML is loaded fine with Mozilla Firefox 42.0 but the images do not appear on Internet Explorer 8.0 on am Windows XP machine.
Gebietsschema: de-DE (de_DE)
IE 8.0 and Windows XP should be no longer supported?
Both Dreamweaver CS6 and IE Edge on Windows 10 cannot read Libreoffice 5.2 files saved as HTML, but have no issue with those saved by LO 4.1.
We should not be embedding images without the user explicitly requesting it.
We should not write non-standard HTML.
Should be reverted until this is resolved.
(In reply to Luke from comment #39)
> Should be reverted until this is resolved.
How would you revert it, please? If you read the patch carefully, you will realize that it is not easy. In the last 2.5 years there were so many changes, that we cannot revert to the old code, but it has to be rewritten.
Ideally we need to introduce a new html filter option. We have SkipImages and SkipHeaderFooter, probably we should have ImagesAsDataUris option which is not set by default, but the mail merge wizard sets it. Later a UI can be written for the option, if users find this useful. It would have been good to write the original patch like this in the first place, but the student who worked on this was not skilled enough and I did not foresee all of the consequences. Nevertheless mail merge is more important in the enterprise than html editing, so after all the current behaviour is better in my opinion, than it was before.
Browsers not rendering data-uri sounds strange . Here with 22.214.171.124 writer uses "image/*" as mimetype, could it be that using * instead of explicit format is confusing other programs? Anyone with IE at hand can double check?
Migrating Whiteboard tags to Keywords: (bibisected)
I have registered for this forum to agree with Luke. I was using Libre Writer 4.1 and the Save As HTML function worked perfectly with images properly exported as links. I have updated to Libre Writer 5.0.3 and now the Save As HTML function does not export the images. This is a disaster for those of us who use Libre Writer as an HTML web editor. Apparently someone at the Document Foundation is not aware that in Save As HTML, images MUST be separate links. Please please fix this error as soon as possible. In the meantime, I either need to go back to Libre Writer 4.1 or use some other document processing program. The real question is how such a major error could have occurred without someone at the Document Foundation catching it???
(In reply to David Spring from comment #43)
> ... I was using LibreWriter 4.1 and the Save As HTML function worked
> perfectly with images properly exported as links. I have updated to
> Libre Writer 5.0.3 and now the Save As HTML function does not export
> the images.
No that is not correct. They are embedded as data URI into the HTML/XHTML by export filter with the "save as". The data URI is read correctly by all current web browsers and most applications. Unfortunately for the 5.0.x release the data URI were not being correctly read back in to LibreOffice. That has been fixed in master (5.2)--but needs to be back ported to 5.0 and 5.1 branches.
> This is a disaster for those of us who use Libre Writer as an HTML web editor.
And? Feel free to simply revert to LibreOffice 4.1.6, from archive here:
It will be sorted out eventually.
> Apparently someone at the Document Foundation is not aware that in
> Save As HTML, images MUST be separate links. Please please fix this error
> as soon as possible.
According to whom? Use of data URI (RFC 2397/3986) is perfectly acceptable HTML/XHTML format. Saying otherwise is just FUD.
However as noted, there are two valid concerns here. One is that linkage details, as recorded by user preference in the ODF document, are being lost during filter export to HTML/XHTML, and can not be restored on import back into LibreOffice for editing in Writer/Web--loss of data and a legitimate regression.
The other facet, is that the use of URL (links: both relative and absolute (RFC 1738/3986) regardless of preferences set per document--is not being honored during export filter to HTML/XHTML, so images are always embedded as data-URI--again a legitimate regression, but just for not honoring a user preference.
> ... The real question is how such a major error could have occurred without
> someone at the Document Foundation catching it???
Nope! That is not the way the project works. Rather see bug 88038, and follow enhancements for HTML5 and CSS3 suggested in bug 95861
(In reply to David Spring from comment #43)
The real question is how such a major error
> could have occurred without someone at the Document Foundation catching it???
Just to clarify this. There is no one "at the Document Foundation" checking bugs. This is *entirely* dependent on volunteers. If you would like to be a volunteer and test pre-releases out so that we (collectively, as volunteers) catch bugs earlier please feel free to join the QA mailing list or the chat at: https://kiwiirc.com/client/irc.freenode.net/libreoffice-qa
I can confirm that Save As HTML is handled correctly in the latest version of Open Office. So this is strictly a problem with LibreOffice. In my opinion, this issue should be raised from Major to Critical. It is not a question of whether web browsers can handle embedded images. It is whether web authors can Save As a (properly formatted) HTML document to work with Sigil in the creation of Epub documents. For the time being, until the developers of LibreOffice can realize the magnitude of this error I will have to tell my students to use Open Office instead of LibreOffice. I am deeply concerned that some members of this forum do not realize what a major -and even critical - problem it is to not have a properly working "Save As HTML document" function.
I’ve updated the priority since saving as HTML is a basic feature of any office suite and the triaging documents suggest pairing critical with highest.
I understand your frustration and agree with you that this is a serious issue. Until this bug is fixed the only way for users to generate proper web pages with their word processor is to use OpenOffice one of the propriety options like MS Word. Unfortunately, Joel is correct in that TDF does not employ any programmers and depends on the community of volunteers to fix issues like this. If you work in academia, maybe someone in the CS department would be interested in working on this. Another option is to start a crowd funding campaign at Freedom Sponsors or Bounty Source.
Would you be willing to bring this up at the next ESC? If not, I will try to.
(In reply to David Spring from comment #46)
> I can confirm that Save As HTML is handled correctly in the latest version
> of Open Office. So this is strictly a problem with LibreOffice.
> It is not a
> question of whether web browsers can handle embedded images. It is whether
> web authors can Save As a (properly formatted) HTML document to work with
> Sigil in the creation of Epub documents.
Sorry, but that is clearly *your* use case.
As an alternative for anyone similarly concerned, use an appropriate extension and export to ePub directly. The eLAIX project's extension ( http://elaix.org/features.html ) does quite nicely with complex documents on 5.0, 5.1 and 5.2/master--which I just verified. Also verified that Sigil has few concerns with resulting ePUB, which passes most of the FlightCrew validation tests.
> For the time being, until the
> developers of LibreOffice can realize the magnitude of this error I will
> have to tell my students to use Open Office instead of LibreOffice.
You are free to do so. Or work around as above.
> I am
> deeply concerned that some members of this forum do not realize what a major
> -and even critical - problem it is to not have a properly working "Save As
> HTML document" function.
This is not a "forum", this is Bugzilla issue tracker for LibreOffice QA and Development. Opinions, and senseless assertions "the sky is falling" , have little value here. Please refrain.
Fully support V Stuart Foote here and just for additional considerations:
10k open bugs - many have users screaming and yelling about how their bug deserves additional priority. We have an objective way of prioritizing and it's not *at all* intended to dictate to developers what they must fix.
Finally, regardless if it's critical or major - it won't necessarily change when/if it's fixed as that will *entirely* depend on a volunteer taking the time to fix it. There is no paid staff, no one dictates what gets fixed, there is no system where users can scream louder to get their pet bugs fixed faster (thankfully as that would destroy the project).
As always your options are:
1) Fix it yourself (that's what open source is all about);
2) Pay for a fix;
3) Find a friend, family member, colleague to fix it;
4) Wait patiently;
5) Use a different product.
You seem to have chosen #5, which is entirely fine with everyone :)
Here is the code for an Open Office 4 "Save As HTML" linked image. It is identical to the code used by Libre Writer before the change to embedded images with LibreOffice 4.2: <p class="western" style="margin-top: 0.04in; margin-bottom: 0.06in"><img align="left" alt="" border="0" height="120" hspace="10" name="Picture 1" src="../Images/ShorelineScribes__6c902d9b.jpg" vspace="14" width="153"/>
Here is the HTML code for the exact same image in LibreOffice 5.0.3 "Save As HTML" image which has been exported not as a separate linked file but left inside of the document as an "embedded" image. Keep in mind that this is the code for a single image. Our books typically have more than 100 images. Does anyone on this forum think this is acceptable HTML code for a single image?
This is why I am forced to recommend Open Office until this HTML coding error in LibreOffice is fixed.
(In reply to David Spring from comment #50)
Sigh... you really are missing the point.
No one disputes that there was a wrong turn 54 months ago in implementing an *unconditional* use of data URI (RFC 2397) especially as the Writer import filters were not adjusted to handle that format round trip--making the Writer/Web mode pretty much useless from that point. Import filtering has been restored, but the Writer/Web canvas still needs dev attention to render the markup for the <img src:data... > URIs.
But, in point of fact the XSLT based export of ODF documents to XHTML has received prolonged dev attention and provides much better fidelity to the original ODF document than the Writer/Web and HTML export filter has ever achieved.
You say your goal is generating ePUB, have you even tested the LibreOffice XSLT export to XHTML as source document for Sigil conversion to ePUB?
In that scenario LibreOffice Writer becomes the editing and layout environment--the XSLT conversion generates the fully styled XHTML. Yes the images are embedded base64 there also. But, you would not need normally to tweak the XHTML code directly--but could do so against the XHTML using your markup editor of choice.
LibreOffice's XSLT generated markup is XHTML tagged as <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1 plus MathML 2.0//EN" "http://www.w3.org/Math/DTD/mathml2/xhtml-math11-f.dtd"> DTD, as opposed to the Writer "save-as" filter which retains its mislabeled legacy <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> DTD. Writer/Web has been a poorly maintained module.
This all will get sorted out in some fashion--likely as Andras T. outlined in comment 40. Restoring correct handling of the per document embedding of images in the source ODF--and protecting that linkage and file naming during export to HTML. With this issue, and bug 88038 and bug 95861, as set at appropriate priorities this will be worked out in some fashion.
But should note it is entirely possible that in the end we strip out the Writer/Web module as unsupportable while bringing the HTML "save-as" export/import filter more in line with performance/fidelity of the XSLT conversion available in that export module. Ending the illusion of LibreOffice as an HTML code editor.
The swhtml filter may even be consolidated with XSLT processing which also could be improved upon. But those are implementation decisions for the dev(s) that elects to take on the task.
(In reply to V Stuart Foote from comment #51)
> ... a wrong turn 54 months ago ...
Sorry, it was just 30 months ago in July 2013, not sure what radix I was thinking in.
Thank you for your suggestions. I have tried dozens of options including Export to XHTML and Elaix. I have been building websites and creating Epubs and teaching courses in HTML since the early 1990s. Being able to use HTML to put content on the web is important. In my opinion, LibreWriter produced very clean HTML up until this change was made (especially compared to MS Weird). I would like to recommend LibreWriter to my students - but embedded images are not proper HTML. It leads to all kinds of problems not only with Epubs but with websites and I think it is important that websites and Epubs be interchangable. Linking images is proper HTML and works well. Libre Writer Save as HTML worked well (not perfect but acceptable) up until someone decided that linking images was not needed. Whoever that person was, they were wrong. Thankfully, Open Office still works so I will just have to recommend that option until LibreOffice Save As HTML is restored. As for Elaix, I use that tool all of the time and have attempted to use that tool to get around the problems created by Libre Writer. The person who claimed that works well with Libre Office 5 is also wrong. I have spent several days this past week trying to get Elaix to work with Libre Office 5 and it is very unstable. It crashes with complex documents with more than 100 images. So does Writer 2 Epub. I am currently using Elaix with Open Office 4. It works for some functions but not others. Many functions I simply have to wait and do in Sigil. This really slows down the process of creating complex books. At some point, when I get more time, I will write up a more complete description of the problems of Elaix with Libre Office 5. But I want to make it clear that it is not accurate to say that Elaix works well with LibreOffice 5. It does not. Finally, I do not mean to be complaining or yelling. I am just surprised at the lack of understanding of the need for and importance of linking and not embedding images in HTML.
Luke, Thank you for raising this problem to critical. HTML is extremely important to many people who use Libre Writer. HTML is the most basic language of the Internet. From the first days of the Internet, images have been a problem and linking images as a separate file has been an acceptable solution. Embedding images is not an acceptable solution and never has been and never will be. I understand that Open Office also has problems. But this is a critical issue for those of us who use HTML to create content for the Internet. As for Libre Writer simply abandoning the function of being an HTML editor, I think that would be an even worse mistake. It would only result in driving millions of people away from LibreOffice and into the hands of Open Office. I think the solution is to take a close look at how Libre Office handled this issue in the past and how Open Office is currently handling this issue in the present and then adopting similar code. I realize that there are all kinds of problems with XHTML and HTML5 and Mail Merge image insertion. But right now, LibreWriter Save As HTML is broken. So is Export As XHTML. I appreciate the efforts of Elaix, Writer 2 Epub and others. But they also are having trouble keeping up with LibreOffice 5. Perhaps the team should consider slowing down on the new features to make sure that former essential features are working again. May the force be with you.
Thanks for all the comments.
Probably this is not too hard to add an option and give the choice.
So people with some coding experience might feel invited to jump on this?
Do realize that there are always more experienced people to help a hand when needed.
*** Bug 98386 has been marked as a duplicate of this bug. ***
We have some movement on this, yea!
Oliver S. is reworking code for the bug 63211 patch, to restore support for linked images in the HTML as user prefers, while keeping base-64 encoding for mail merge support.
Oliver Specht committed a patch related to this issue.
It has been pushed to "master":
tdf#63211: saving embedded images to HTML optional
It will be available in 5.2.0.
The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.
Huge thanks, guys!