Bug 162131 - Gratuitous font tag wrapping language span when saving as HTML
Summary: Gratuitous font tag wrapping language span when saving as HTML
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: filters and storage (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: (X)HTML-Export RTL CTL Language-Grouping
  Show dependency treegraph
 
Reported: 2024-07-21 16:42 UTC by Eyal Rozenberg
Modified: 2024-09-12 05:34 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2024-07-21 16:42:51 UTC
Reproduction instructions:

1. Create a new Writer document.
2. Ensure your default direction is LTR and default paragraph style direction LTR.
3. (Maybe) have some sort of _IL locale, and a Hebrew keyboard layout? Not sure) 
3. Enter the following text: אחת, שתיים
4. On the menus, File > Save > HTML

You get something like the following (dropping the meta tags for brevity):


<!DOCTYPE html>
<html>
<head>
  <style type="text/css">
    @page { size: 21cm 29.7cm; margin: 2cm }
    p { line-height: 120%; margin-bottom: 0.25cm; background: transparent }
    p.ctl { font-family: "Nachlieli CLM" }
    a:link { color: #000080; text-decoration: underline }
    a:visited { color: #800000; text-decoration: underline }
  </style>
</head>
<body lang="en-IL" link="#000080" vlink="#800000" dir="ltr">
<p class="western" style="line-height: 100%; margin-bottom: 0cm">
One, two</p>
<p class="western" style="line-height: 100%; margin-bottom: 0cm">
<font face="Nachlieli CLM"><span lang="he-IL">אחת שתיים</span></font>,
<font face="Nachlieli CLM"><span lang="he-IL">ושלוש</span></font>.</p>
</body>
</html>

Ignoring the question of whether we should have these spans, and why there are two of them instead of one (see bug 93716) - note we have a font tag wrapping each span tag. This is silly.

Alternative 1
==============

The minimum and simplest thing to do would be:

<span lang="he-IL" style="font-family: Nachlieli CLM">אחת שתיים</span>


Alternative 2
==============

Something better would be using the styles. A simple use of the styles could be: 

    span.ctl { font-family: "Nachlieli CLM" }

and then:

    <span lang="he-IL" class=ctl>אחת שתיים</span>


Alternative 3
==============

A more complex use of styles:

    span[lang=he-IL] { font-family: "Nachlieli CLM"; }

and then just:

    <span lang="he-IL">אחת שתיים</span>


Alternative 4
==============

But actually, consider this: The HTML document doesn't specify a Western-language-group font, at all. And the document does not have DF setting the typeface. So, why are we even setting "Nachlieli CLM", If we're not setting "Liberation Serif", for example, for the LTR text? Why not just _drop the font tag altogether_, as well as the p.ctl style?
Comment 1 Buovjaga 2024-09-12 05:34:48 UTC
Reproduced.

The insertion of the font element and its face attribute when saving as HTML happens here (I confirmed by commenting stuff out and testing with the built code):
https://git.libreoffice.org/core/+/4cb1849ebd38fde513e15c3087f74871dc5e5124/sw/source/filter/html/htmlatr.cxx#2781

Arch Linux 64-bit
Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 3d9b8701cb1751e4139ffa24f72bb836eb877fd1
CPU threads: 8; OS: Linux 6.10; UI render: default; VCL: kf6 (cairo+wayland)
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: CL threaded
Built on 12 September 2024