Bug 154434 - FILEOPEN HTML: Writer loses HTML layout
Summary: FILEOPEN HTML: Writer loses HTML layout
Status: RESOLVED WONTFIX
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.5.1.2 release
Hardware: All All
: medium enhancement
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:html, needsUXEval
Depends on:
Blocks: Writer-Web-Layout HTML-Import
  Show dependency treegraph
 
Reported: 2023-03-28 19:00 UTC by Shmuel (Seymour J.) Metz
Modified: 2024-08-28 16:45 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:


Attachments
File wih styles for image amd text placement (15.02 KB, text/html)
2023-03-28 19:00 UTC, Shmuel (Seymour J.) Metz
Details
Layout in browser (154.20 KB, image/png)
2023-03-28 22:07 UTC, Shmuel (Seymour J.) Metz
Details
Layout in Writer (243.77 KB, image/png)
2023-03-28 22:13 UTC, Shmuel (Seymour J.) Metz
Details
Writer cannot even ender the simplest of simple HTML... (7.84 KB, text/html)
2024-08-28 09:53 UTC, robert
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Shmuel (Seymour J.) Metz 2023-03-28 19:00:50 UTC
Created attachment 186269 [details]
File wih styles for image amd text placement

I open the attached file and Writer loses the formatting. The images are not properly placed or scaled and the text is not centered. I haven't checked whether the hyperlinks are intact.
Comment 1 Shmuel (Seymour J.) Metz 2023-03-28 22:07:48 UTC
Created attachment 186271 [details]
Layout in browser

This is how the HTML file displays in Firefox. The images are contained in ../Images, a sibling directory to the one containg the HTML.
Comment 2 Shmuel (Seymour J.) Metz 2023-03-28 22:13:24 UTC
Created attachment 186272 [details]
Layout in Writer

This is how the HTML file appears when I open it in LibreOffice.
Comment 3 Dieter 2023-04-11 08:56:48 UTC
Thank you for reporting the bug. One question: Is it an html-document from a website (can you paste address here) or is it an odt-file saved as html (could you please attach odt-file in this case)?

=> NEEDINFO
Comment 4 Shmuel (Seymour J.) Metz 2023-04-11 10:21:50 UTC
This is a hand crafted HTML file that I wanted to convert to LibreOffice Writer. The three attachments are:

1. The HTML file.
2. A screen shot of Firefox viewing the file with the file schema
3. A screen shot of LibreOffice Writer after importing the HTML file
Comment 5 Dieter 2023-04-12 09:34:40 UTC
I confirm, that LO opens file wrong, but I'm not so familiar with html, that I'm able to decide, if it is a bug in LO Writer or if there is something wrong with html code. I hope, htat somebody else can help.
Comment 6 Buovjaga 2023-04-18 07:08:43 UTC
LibreOffice is not a web browser. LibreOffice's document model can not be mapped to the layout capabilities of browsers.
Comment 7 Dieter 2024-04-14 12:40:43 UTC
(In reply to Buovjaga from comment #6)
> LibreOffice is not a web browser. LibreOffice's document model can not be
> mapped to the layout capabilities of browsers.

I agree, but what does it mean for this report?
Comment 8 Buovjaga 2024-04-14 17:04:03 UTC
(In reply to Dieter from comment #7)
> (In reply to Buovjaga from comment #6)
> > LibreOffice is not a web browser. LibreOffice's document model can not be
> > mapped to the layout capabilities of browsers.
> 
> I agree, but what does it mean for this report?

It means the request in this report can't happen unless we get the ambition to do web layout in Writer. I don't know how that would work, but everything is possible.
Comment 9 Armondo Lopez 2024-04-14 20:26:30 UTC
I can confirm that this behavior is still present in

Version: 24.8.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: a2265e8faa099d9652efd12392c2877c2df1d1eb
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

and

Version: 24.2.1.2 (X86_64) / LibreOffice Community
Build ID: db4def46b0453cc22e2d0305797cf981b68ef5ac
CPU threads: 8; OS: Windows 10.0 Build 19045; UI render: default; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded
Comment 10 Robert Großkopf 2024-04-15 07:18:01 UTC
The behavior is confirmed. LO isn't designed for creating web pages or browsing through web pages. So this might be an enhancement request.
Comment 11 Buovjaga 2024-04-15 11:36:03 UTC
(In reply to Robert Großkopf from comment #10)
> The behavior is confirmed. LO isn't designed for creating web pages or
> browsing through web pages. So this might be an enhancement request.

Then we should ask design team, if they think LibreOffice should include a web browser competitive with the major players and also to adapt our document model completely according to the requirements.
Comment 12 Heiko Tietze 2024-04-15 13:21:12 UTC
I think key is the anchoring of images that we do 'To Character', by default. At least after opening Writer with the document as parameter (loading via Ctrl+O stalls infinitely; and opening from the start center runs the document in Writer Web) and manually switching the anchor from 'As Character' as it always is (obviously ignoring tools > options > writer > formatting aids > image anchor) brings text next to the image.

Ultimately we will not reach pixel-perfect representation, as browsers with different engines works differently. How about using a table in the HTML sources?

(In reply to Buovjaga from comment #11)
> Then we should ask design team, if they think LibreOffice should include a
> web browser competitive with the major players and also to adapt our
> document model completely according to the requirements.
We shouldn't include a complete browser. But since we are proud to filter data from almost any source, we need to support HTML too. At least the basic features.
Comment 13 V Stuart Foote 2024-04-15 14:18:02 UTC
Review the see also list for discussion of what would be needed to move forward from the LO filter support of HTML4.0 transitional and CSS 1.1 styling, to make LO relevant for current W3C/WHATWG web standards.

There are ongoing suggestions to remove the Writer Web module. And to instead improve the import filters for importing to Writer, Draw, Impress, or Calc as ODF only documents. With corresponding export filter work to render ODF back to web content.

The utility of LibreOffice as an HTML4 editor continues to degrade. 

So there is nothing actionable here for the HTML of the OP (and here we simply don't handle the class= positioning of the embedded css on filter import).

IMHO => NAB and => WF for any effort to address this single issue.

Dev's with UX-advise agreement should decide what to do with the Writer Web module.
Comment 14 Buovjaga 2024-04-15 16:17:27 UTC
(In reply to Heiko Tietze from comment #12)
> I think key is the anchoring of images that we do 'To Character', by
> default. At least after opening Writer with the document as parameter
> (loading via Ctrl+O stalls infinitely; and opening from the start center
> runs the document in Writer Web) and manually switching the anchor from 'As
> Character' as it always is (obviously ignoring tools > options > writer >
> formatting aids > image anchor) brings text next to the image.
> 
> Ultimately we will not reach pixel-perfect representation, as browsers with
> different engines works differently. How about using a table in the HTML
> sources?

If you look at the source document, it uses position:absolute, floats, display:inline, percentage widths, max-width, margins. All of these are specified in the CSS standard to work and play together in a certain way.
Comment 15 Buovjaga 2024-04-16 08:56:22 UTC
(In reply to Buovjaga from comment #14)
> If you look at the source document, it uses position:absolute, floats,
> display:inline, percentage widths, max-width, margins. All of these are
> specified in the CSS standard to work and play together in a certain way.

Just to clarify as there was a misunderstanding in a chat channel, I think this report should be closed as wontfix due to being unrealistic.
Comment 16 jan d 2024-04-16 10:40:45 UTC
WONTFIX also makes sense to me.

> The utility of LibreOffice as an HTML4 editor continues to degrade. 

yes – 

> There are ongoing suggestions to remove the Writer Web module. 
> And to instead improve the import filters for importing to Writer,
> Draw, Impress, or Calc as ODF only documents.

Makes sense to me and seems to be easier to handle UX-wise, too, since it would be an import/export, just like the other formats.
Comment 17 Heiko Tietze 2024-04-16 13:14:03 UTC
All comments vote for WF. One the one hand we want to support as many formats as possible and do have support for HTML but on the other we surely cannot catch up with Internet browsers. We might be able to improve in some area but likely not in case of complex layouts. So the recommendation is to create a more simple document that LibreOffice can load rather than spending a lot of effort.
Comment 18 robert 2024-08-28 09:52:07 UTC
Writer cannot even import the plainest of plain HTML (see the attached), which is rendered perfectly OK, even with Word XP, dating back from 2001. However, what's worse, it actually opens the link, which in this case is harmless, but might, in other scenarios, pull in all kinds of nasty stuff.

If you want to compete with this other office suite, you need at least render such simple html correctly!
Comment 19 robert 2024-08-28 09:53:22 UTC
Created attachment 196061 [details]
Writer cannot even ender the simplest of simple HTML...
Comment 20 Heiko Tietze 2024-08-28 10:16:16 UTC
(In reply to robert from comment #18)
> Writer cannot even import the plainest of plain HTML...
Wouldn't call it simple. But let's check the details.

The first line ends with "<snip...> <b>To</b>                              " followed by "| +-- <...snip>". I see no line break such as <BR>, <P>, or #13. What exactly should the HTML filter accept as line break (or IOW what is the standard here)?
Comment 21 Robert Großkopf 2024-08-28 10:47:21 UTC
(In reply to Heiko Tietze from comment #20)
> (In reply to robert from comment #18)
> > Writer cannot even import the plainest of plain HTML...
> Wouldn't call it simple. But let's check the details.
> 
> The first line ends with "<snip...> <b>To</b>                              "
> followed by "| +-- <...snip>". I see no line break such as <BR>, <P>, or
> #13. What exactly should the HTML filter accept as line break (or IOW what
> is the standard here)?

Seems the import filter doesn't know anything about
"white-space"
With option "pre" it will accept all spaces and won't set more than one space to one. It will also accept &#10 or \n.

Also import filter ignores display:none.
Comment 22 V Stuart Foote 2024-08-28 16:45:57 UTC
It was confirmed, but a clear => WF of OP or later comments where attachment 196061 [details] needs to be edited for line ends prior to import.

Other than WF here--two paths forward, either enhance the Writer Web module and its filters to parse current CSS and HTML5 and write out viable CSS/HTML5 content. 

Or *dump* Writer Web module and its HTML4 Transitional implementation completely, instead refactoring the filters to import Web documents into the core modules and implicitly converting them to native ODF.

Export as/print to HTML/XHTML to handle publication, just like PDF and EPub.