Bug 160322 - import HTML ignores "hidden" attribute
Summary: import HTML ignores "hidden" attribute
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: filter:html
Depends on:
Blocks: HTML-Import
  Show dependency treegraph
 
Reported: 2024-03-22 23:16 UTC by John Gregg
Modified: 2024-03-26 09:20 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments
test HTML file with hidden text that should not show up. (254 bytes, text/html)
2024-03-22 23:18 UTC, John Gregg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description John Gregg 2024-03-22 23:16:59 UTC
Description:
When I use LibreWriter to open an HTML file, and some blocks (like paragraph, <p>) are marked "hidden" (<p hidden>), the hidden text shows up anyway. It should not be rendered, just as a browser does not show it in the original HTML. If I then export the file, as a docx/Word file, the text shows up there as well.

Steps to Reproduce:
1. see above. Open HTML file with hidden text in Writer.
2. Note that the "hidden" text shows up.
3. Save As docx. Note that the hidden text appears there too.

Actual Results:
as described above. Try this, as the content of your HTML file:

<!DOCTYPE html>
<html lang="en">
<head>
<title>test of hidden attribute</title>
<body>

<p>
This paragraph should show up plain as day.
</p>
<p hidden>
In contrast, you should not be able to see this paragraph, since it is
"hidden".
</p>
</body>
</html>


Expected Results:
as described above - hidden text shows up. Libre Writer simply ignores "hidden" attribute in HTML entirely. Clearly not expected or desired behavior.


Reproducible: Always


User Profile Reset: No

Additional Info:
Suppress the "hidden" text.
Comment 1 John Gregg 2024-03-22 23:18:23 UTC
Created attachment 193250 [details]
test HTML file with hidden text that should not show up.
Comment 2 V Stuart Foote 2024-03-23 15:49:44 UTC
Why? LibreOffice is an ODF editor. HTML/CSS2 era <p hidden> has no meaning here (i.e. there is no import filter to handle it when html is opened into Writer Web). 

ODF 1.3 provides for hidden text that is annotated '<style:text-properties text:display="none"/>' [1]

It can be applied as DF from Character Attributes

Or defined for use within a Paragraph style from the style's Character Attributes.

DF spans in the save-as to HTML receive the css "style=display: none"; while paragraphs with defined display: none hidden styles are dropped on save-as filter export. 

At this point it is feasible, but not clear there is any real value to adding "hidden" to the HTML4 transitional era import filters. With no project interest in advancing HTML5/CSS3 support (import or export) of bug 95861. 

=-ref-=
https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part3-schema/OpenDocument-v1.3-os-part3-schema.html#property-text_display
Comment 3 V Stuart Foote 2024-03-23 16:12:04 UTC
(In reply to V Stuart Foote from comment #2)

> DF spans in the save-as to HTML receive the css "style=display: none"; while
> paragraphs with defined display: none hidden styles are dropped on save-as
> filter export. 
> 

And not unexpected, with save-as from Writer the .html with "display: none" spans on round trip opening in Writer Web, or Writer module, do not parse the hidden text markup and text is shown.
Comment 4 John Gregg 2024-03-23 17:41:02 UTC
The "hidden" attribute is valid HTML5, not a deprecated legacy feature from earlier HTMLs. I am not concerned with how Writer exports into HTML, only how it interprets HTML on input. That is, among Writer's many supported input file formats (raw text, Word doc/docx, etc.) it certainly knows how to import and render HTML5. It knows, for example, to italicize text in the <i> tag, or that <p> encloses a paragraph (however that ends up being represented in its native ODT/ODF format). It comes as an unpleasant surprise, then, that the "hidden" attribute is ignored. If there is some way of hiding text in the native ODT/ODF format, then on import from HTML, the "hidden" attribute should be converted to Writer's native way of suppressing text. I think it would be better for Writer to throw the "hidden" text away entirely, just delete it upon import from HTML, rather than its current behavior of displaying it as if there had been no "hidden" attribute at all.
Comment 5 Heiko Tietze 2024-03-25 10:59:13 UTC
If we continue to provide filter for the HTML file format we should support the most relevant attributes, at least. Perhaps you can describe your use case, John, to better understand why you want to use LibreOffice to read and edit HTML files including the not so common hidden attribute.
Comment 6 John Gregg 2024-03-25 17:04:01 UTC
If you are asking about the particulars of my use case, sure.
I have a document that was created as a flat HTML file. I need to share it with someone who wants a docx. I use LibreOffice Writer to import the HTML file, then Save As docx. It all looks fine, except for this one glitch. Given that Writer handles all other aspects of reading and rendering HTML properly, it is bad that it ignores the "hidden" attribute. The original HTML file had "commented out" certain parts, suppressing them with "hidden" (perhaps they were notes for ideas that had not been fully worked out yet). When they show up in the docx file that I share with my collaborator, it is a problem for both of us. As I said, as a first pass, it would be better if Writer just discarded the hidden parts entirely rather than showing them.