Bug 118902 - RTF import: wrong font for Hebrew text
Summary: RTF import: wrong font for Hebrew text
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0 all versions
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: bibisected, filter:rtf, regression
Depends on:
Blocks: RTF-Character Hebrew RTF-New-Import
  Show dependency treegraph
 
Reported: 2018-07-23 06:04 UTC by Mike Kaganski
Modified: 2023-01-18 05:44 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Wrong font for Hebrew text in RTF: minimal reproducer (193 bytes, application/msword)
2018-07-23 06:04 UTC, Mike Kaganski
Details
comparison MSO 2010 and LibreOffice 6.2 (30.04 KB, image/png)
2018-07-23 07:58 UTC, Xisco Faulí
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2018-07-23 06:04:43 UTC
Created attachment 143703 [details]
Wrong font for Hebrew text in RTF: minimal reproducer

In the attachment, the only defined font in the RTF is Times New Roman. It is applied to all the text in the document, and Word shows all the text (both Russian and Hebrew) using TNR. But when opened in LibreOffice, the Hebrew font is shown using the font defined in Options→LibreOffice Writer→Basic Fonts (CTL). Replacing \f1 in \pard with \loch\f1\hich\af1 fixes the problem, so apparently LibreOffice does not treat \fN as setting font for all language variants, just for Western languages, despite "Associated Character Properties" chapter in [1] tells:

> Although RTF defines a broad variety of associated character properties,
> any implementation may choose not to implement a particular associated
> character property and share the property between the Latin and Arabic
> fonts

which implies that associated character properties are optional, and in the absence of those, standard properties (like \fN) should be used for all language variants.

Tested with Version: 6.1.0.2 (x64)
Build ID: b3972dcf1284967612d5ee04fea9d15bcf0cc106
CPU threads: 4; OS: Windows 10.0; UI render: default; 
Locale: ru-RU (ru_RU); Calc: CL

Said to be OK in OOo and AOO; suspect regression.

[1] Rich Text Format (RTF) Specification Version 1.9.1 https://www.microsoft.com/en-us/download/details.aspx?id=10725
Comment 1 Xisco Faulí 2018-07-23 07:58:44 UTC
Created attachment 143709 [details]
comparison MSO 2010 and LibreOffice 6.2
Comment 2 Xisco Faulí 2018-07-23 08:02:30 UTC
Reproduced in

Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a)

but not in

LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4
Comment 3 Xisco Faulí 2018-07-23 08:02:46 UTC
*** Bug 118896 has been marked as a duplicate of this bug. ***
Comment 4 Xisco Faulí 2018-07-23 08:17:59 UTC
So, in 

LibreOffice 3.3.0 
OOO330m19 (Build:6)
tag libreoffice-3.3.0.4

the font was Times New Roman, then in

LibreOffice 3.5.0 
Build ID: d6cde02

it was to Sans, and later, in this range of commits ( https://cgit.freedesktop.org/libreoffice/core/log/?qt=range&q=e19f1afb2c253944968f85b963934a60b87f472a..3cf91a21fc5089fb7f051bf8a04d2049da88179f ), it changed to Lohit Devanagari
Comment 5 QA Administrators 2019-08-03 03:06:07 UTC Comment hidden (obsolete)
Comment 6 Mike Kaganski 2019-08-05 05:14:11 UTC
Still repro with Version: 6.3.0.3 (x64)
Build ID: c75130c129d9c5e43b76e4f26881b3db8bdb5c91
CPU threads: 12; OS: Windows 10.0; UI render: GL; VCL: win; 
Locale: en-US (ru_RU); UI-Language: en-US
Calc: CL
Comment 7 Eyal Rozenberg 2021-02-12 22:42:20 UTC
Bug still manifests with:

Version: 7.1.0.3 / LibreOffice Community
Build ID: f6099ecf3d29644b5008cc8f48f42f4a40986e4c
CPU threads: 4; OS: Linux 5.9; UI render: default; VCL: gtk3
Locale: he-IL (en_IL); UI: en-US
Comment 8 Aron Budea 2021-03-05 01:04:07 UTC
This is similar to bug 113084, perhaps this one identifies the range where the font actually changed (since in that bug report the text wasn't shown for several versions).

I'm somewhat suspicious about this commit in the range:
https://cgit.freedesktop.org/libreoffice/core/commit/?id=fab0a2c6068577081abdad90a3b1191b6fc5df29
author		Caolán McNamara <caolanm@redhat.com>	2012-08-31 13:17:55 +0100
committer	Caolán McNamara <caolanm@redhat.com>	2012-08-31 13:18:26 +0100

"workaround fdo#35118 in the absence of fdo#19869"

The bug in question was reported by Caolán to fontconfig, and the title is interesting:
"Prioritize fonts that support a territory-less language variant when no exact language match"
https://bugs.freedesktop.org/show_bug.cgi?id=35118
https://gitlab.freedesktop.org/fontconfig/fontconfig/-/issues/30

( The other referenced bug:
"fontconfig should change to BCP 47 language tags"
https://bugs.freedesktop.org/show_bug.cgi?id=19869 )
Comment 9 Caolán McNamara 2021-03-22 17:38:44 UTC
The code in https://cgit.freedesktop.org/libreoffice/core/commit/?id=fab0a2c6068577081abdad90a3b1191b6fc5df29 is fontconfig-using platforms only, so not windows or mac and the problem is reported initially on windows, so changing that code so won't have an effect on that platform.

wrt regression vs AOO/OOo we replaced the rtf parser with a different one, it is very plausible that's the crucial difference and there isn't a simple commit that if reverted makes it all work like the past

mike's initial comment is probably the way to go and something like at
writerfilter/source/dmapper/DomainMapper.cxx:356 of
case NS_ooxml::LN_CT_Fonts_ascii:
adding a line under

m_pImpl->GetTopContext()->Insert(PROP_CHAR_FONT_NAME, uno::makeAny( sStringValue ))

of

m_pImpl->GetTopContext()->Insert(PROP_CHAR_FONT_NAME_COMPLEX, uno::makeAny( sStringValue ));

might be the right thing to do, but I don't know the full consequences of making that change. Presumably LN_CT_Fonts_hAnsiTheme has a similar issue, maybe others too
Comment 10 Miklos Vajna 2021-03-23 08:07:46 UTC
If this is a problem since the new RTF import in 3.5, please add this as a dependency to bug 113083, so as / when I or somebody else have time to look at such bugs, we have a good starter list. Thanks.