Bug 113134 - Hebrew Dagesh/Mapiq mis-rendered with Culmus fonts in special chars dialog and when inserted
Summary: Hebrew Dagesh/Mapiq mis-rendered with Culmus fonts in special chars dialog an...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.4.2.2 release
Hardware: All All
: medium minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Font-Rendering Special-Character RTL-Hebrew Kerning
  Show dependency treegraph
 
Reported: 2017-10-15 09:44 UTC by Eyal Rozenberg
Modified: 2024-02-01 14:13 UTC (History)
11 users (show)

See Also:
Crash report or crash signature:


Attachments
GNOME charmap utility displaying the Dagesh character of David CLM (65.08 KB, image/png)
2017-10-15 10:46 UTC, Eyal Rozenberg
Details
Mapiq/Dagesh inserted from different sources (8.68 KB, application/vnd.oasis.opendocument.text)
2017-10-15 11:48 UTC, Eyal Rozenberg
Details
Mapiq from different sources - screenshot (11.92 KB, image/png)
2017-10-15 11:48 UTC, Eyal Rozenberg
Details
dagesh character found in master (69.88 KB, image/png)
2017-10-15 14:41 UTC, Yousuf Philips (jay) (retired)
Details
double-mapiq after inserting a Dagesh/Mapiq and setting its font right (11.63 KB, image/png)
2017-10-15 15:11 UTC, Eyal Rozenberg
Details
screencast of font change with mouse or keyboard selection (2.98 MB, video/webm)
2017-10-15 20:46 UTC, Yousuf Philips (jay) (retired)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2017-10-15 09:44:22 UTC
The Hebrew language employs punctuations - points and lines added over, under or within a letter to indicate aspects of its pronunciation. One of the punctuation symbols is the Dagesh - it converts a V into B, F into P and otherwise places an emphasis on a consonant.

Now, punctuation symbols, including the Dagesh, have Unicode character codes (and are considered "modifier" characters if I'm not mistaken). Dagesh is at U+0x5BC (which is the same character for Mapiq, which is about the same thing as a Dagesh).

Now, if I choose to insert a special character in LibreOffice Writer, and choose the Hebrew font I'm using for my text (say, David CLM from the culmus project), I see what is supposedly the entire set of Hebrew characters provided by this font. Yet, while some punctuation symbols can be chosen - such as Patah, Qamats and others - I don't see the Dagesh character. David CLM _does_ have a Dagesh, as I observe using an external character map utility.

It seems like the Special Character dialog is somehow filtering more of the Hebrew fonts than it should be.
Comment 1 Lior Kaplan 2017-10-15 09:52:22 UTC
Please upload a test document, so others could verify the report.
Comment 2 Eyal Rozenberg 2017-10-15 10:10:35 UTC Comment hidden (obsolete)
Comment 3 Dieter 2017-10-15 10:36:02 UTC Comment hidden (obsolete)
Comment 4 Eyal Rozenberg 2017-10-15 10:46:24 UTC Comment hidden (obsolete)
Comment 5 Eyal Rozenberg 2017-10-15 11:47:19 UTC
Using the master build  which Dieter linked to, I noticed the following:

* The character U+0x5BC _is_ one of the characters presented in the dialog, but
* The Dagesh/Mapiq (U+0x5BC) character is _not_ shown as placed inside the base character, but rather outside and to the left of it (i.e. after it).
* If I choose to insert a Dagesh/Mapiq, the character at which it is inserted is not affected, and indeed we see the Dagesh/Mapiq on the outside and to the left.

That lead me to check myself again with 5.4. Well, I _can_ find U+0x5BC - I didn't notice it before because it didn't look like a Dagesh, but rather like a Shuruq. Checking that, it seems 0x5BC is indeed used for Shuruq when composed with a vav (ו) character (too bad!). But thhat's not how it behaves with other letters.

But it gets more complicated, since it's still the case - with both LO 5.4 and LO 6.0 - that if I paste an U+0x5BC from an outside charmap I get a Dagesh/Mapiq composed as I expect, but if I insert it using the Special Characters dialog, it always behaves like a Shuruq.

Attachment to illustrate this forthcoming.
Comment 6 Eyal Rozenberg 2017-10-15 11:48:09 UTC
Created attachment 136983 [details]
Mapiq/Dagesh inserted from different sources
Comment 7 Eyal Rozenberg 2017-10-15 11:48:53 UTC
Created attachment 136984 [details]
Mapiq from different sources - screenshot
Comment 8 Lior Kaplan 2017-10-15 13:04:02 UTC
I've added U+0x5BC and it's indeed a bit after the letter than inside it. But it also get created with a different font setting. Marking it and setting it to David CLM, makes it fall into the right place.

Eyal - the two test document you've added has two chars after the ה. Removing the 2nd and changing the font as mentioned above makes it look OK.
Comment 9 Lior Kaplan 2017-10-15 13:52:02 UTC
(In reply to Lior Kaplan from comment #8)
> I've added U+0x5BC and it's indeed a bit after the letter than inside it.
> But it also get created with a different font setting. Marking it and
> setting it to David CLM, makes it fall into the right place.

Both in 5.4.1 and in master build:

Version: 6.0.0.0.alpha0+
Build ID: 9685532bc859167c1aa856c6f6792559904b8fb9
CPU threads: 8; OS: Linux 4.13; UI render: default; VCL: gtk3; 
Locale: en-US (en_US.UTF-8); Calc: group

Eyal, please update the bug description.
Comment 10 Yousuf Philips (jay) (retired) 2017-10-15 14:41:39 UTC Comment hidden (obsolete)
Comment 11 Eyal Rozenberg 2017-10-15 15:06:39 UTC
(In reply to Yousuf Philips (jay) from comment #10)
> I found it in 5.4 and master, so this bug likely needs to be closed.

But the character rendered in your attachment (which corresponds to mine) is _not_ a Dagesh/Mapiq - which goes in the middle of the underlying character (or about the middle) - not to the side of it.

I am changing the bug name accordingly. It's not that it it's missing - it's mis-rendered.
Comment 12 Eyal Rozenberg 2017-10-15 15:11:32 UTC
Created attachment 136993 [details]
double-mapiq after inserting a Dagesh/Mapiq and setting its font right

If I do the following:

1. Open the "mapiq from different sources" document
2. Select the Heh with the Mapiq (הּּ)
3. Set the font to David CLM

The Mapiq gets doubled. Same thing happens if Do:

1. Create new document
2. Set the font to David CLM
3. Write the letter He (ה)
4. On the menus, choose, Insert | Special Character
5. Select U+0x5BC (at this point you have the Mapiq rendered to the side of the He)
6. Select the Heh with the Mapiq (הּּ)
7. Set the font to David CLM

you again get the double-Mapiq - in the middle and to the side.
Comment 13 Eyal Rozenberg 2017-10-15 15:12:49 UTC
And I see the double-Mapiq with both 5.4.2.2 release and 6.0 master (from Yousuf's link).
Comment 14 Yousuf Philips (jay) (retired) 2017-10-15 20:25:30 UTC
(In reply to Eyal Rozenberg from comment #12)
> Created attachment 136993 [details]
> double-mapiq after inserting a Dagesh/Mapiq and setting its font right
> 
> If I do the following:
> 
> 1. Open the "mapiq from different sources" document
> 2. Select the Heh with the Mapiq (הּּ)
> 3. Set the font to David CLM
> 
> The Mapiq gets doubled.

It worked fine with me with these steps

1. open attachment 136983 [details]
2. select after the Heh with the Mapiq
3. press delete
4. select the Heh with the Mapiq
5. set the font to David CLM

I opened the odt in word and it showed a square character next to the Mapiq, so i'm assuming two Mapiqs were pressed next to the Heh, which is why one appeared in the middle and one at the end in your steps.

> Same thing happens if Do:
> 
> 1. Create new document
> 2. Set the font to David CLM
> 3. Write the letter He (ה)
> 4. On the menus, choose, Insert | Special Character
> 5. Select U+0x5BC (at this point you have the Mapiq rendered to the side of
> the He)
> 6. Select the Heh with the Mapiq (הּּ)
> 7. Set the font to David CLM
> 
> you again get the double-Mapiq - in the middle and to the side.

I dont get double-Mapiq. Screencast coming.
Comment 15 Yousuf Philips (jay) (retired) 2017-10-15 20:46:24 UTC
Created attachment 137001 [details]
screencast of font change with mouse or keyboard selection

So inserting the Mapiq from the special character dialog does insert it in the wrong location and selecting the Heh and the Mapiq results in the font name toolbar field to go blank, which normally means that multiple fonts are in the selection, so this is quite strange.

But what i noticed when testing this bug and which can be seen in the screencast is that when you select the Heh by keyboard, it will also select the Mapiq, which will result in the character appearing correctly when changing the font, but if you select the Heh by mouse, it will not select the Mapiq, resulting in no change of the Mapiq when changing the font. This should be filed as a separate bug.
Comment 16 Eyal Rozenberg 2017-10-15 20:55:34 UTC
(In reply to Yousuf Philips (jay) from comment #15)
>  This should be filed as a separate bug.

Go for it :-)

Additionally, I'm starting to think maybe this bung and bug 113135 are more related than one would initially assume... perhaps even being dupes of each other.
Comment 17 Yousuf Philips (jay) (retired) 2017-10-18 18:58:12 UTC
Khaled, Maxim, Caolan: Any thoughts why inserting this hebrew diacritic isnt getting combined correctly with the hebrew character before it, until both character as selected and the font name reapplied to both of them?
Comment 18 ⁨خالد حسني⁩ 2017-10-19 01:05:29 UTC
(In reply to Yousuf Philips (jay) from comment #17)
> Khaled, Maxim, Caolan: Any thoughts why inserting this hebrew diacritic isnt
> getting combined correctly with the hebrew character before it, until both
> character as selected and the font name reapplied to both of them?

Looks like the special characters dialog is inserting the character with a different font. Checking the actual ODT XML should show if this is the case. That being said, I know nothing about how this dialog works.
Comment 19 Heiko Tietze 2017-10-19 05:53:07 UTC
Akshay may know.
Comment 20 ⁨خالد حسني⁩ 2017-10-19 11:47:52 UTC
Looking into the XML source, the mark character is inserted as <span> with a text style that points to the same font (though I don’t understand ODT’s XML that much), and I think we don’t join spans when shaping (long outstanding bug, might be the root for bug 61444).
Comment 21 ⁨خالد حسني⁩ 2017-10-19 11:51:41 UTC
This is likely to be breaking other things as well like kerning a and ligatures.
Comment 22 Shai Berger 2017-10-19 14:53:13 UTC
Eyal -- On Linux, you don't need to use the special character dialog for Hebrew diacritics (Niqqud); the default Hebrew key mapping has U+0x5BC on <right-alt>+S (that's ד for דגש) and the rest of them on similarly reasonable places. That is, of course, a workaround, there still appears to be a problem with the special-char dialog.
Comment 23 Yousuf Philips (jay) (retired) 2017-10-19 19:06:53 UTC
(In reply to Khaled Hosny from comment #20)
> Looking into the XML source, the mark character is inserted as <span> with a
> text style that points to the same font (though I don’t understand ODT’s XML
> that much), and I think we don’t join spans when shaping (long outstanding
> bug, might be the root for bug 61444).

Yes i checked the XML as well which looks like this

 <style:style style:name="P1" style:family="paragraph" ... >
   <style:paragraph-properties ... />
   <style:text-properties style:font-name="David CLM" />
 </style:style>
 <style:style style:name="T2" style:family="text">
   <style:text-properties ... style:font-name-complex="David CLM1" />
 </style:style>
 ...
 <text:p text:style-name="P1">
  ה
  <text:span text:style-name="T2">ּ</text:span>
 </text:p>

So the problem seems to be that the special character dialog is not taking into account that 'David CLM' is already set in P1's <style:text-properties> as style:font-name and as P1 and T2 dont have 'David CLM' in style:font-name-complex, they arent able to mix. Here is what the XML looks like when they are correctly joined.

 <style:style style:name="P1" style:family="paragraph" ...>
   <style:paragraph-properties ... />
   <style:text-properties style:font-name="David CLM" style:font-name-complex="David CLM" />
 </style:style>
 <style:style style:name="T3" style:family="text">
   <style:text-properties officeooo:rsid="0004f39f" />
 </style:style>
 ...
 <text:p text:style-name="P1">
  ה
  <text:span text:style-name="T3">ּ</text:span>
 </text:p>
Comment 24 Eyal Rozenberg 2017-10-19 20:02:46 UTC
(In reply to Yousuf Philips (jay) from comment #23)
>    <style:text-properties ... style:font-name-complex="David CLM1" />

Is there really a "David CLM1" attribute there or is it just a typo?
Comment 25 Yousuf Philips (jay) (retired) 2017-10-20 12:43:10 UTC
(In reply to Eyal Rozenberg from comment #24)
> Is there really a "David CLM1" attribute there or is it just a typo?

Not a typo, as its the style:name value and not the font name. Bellow is pulled from attachment 136983 [details].

<style:font-face style:name="David CLM1" svg:font-family="'David CLM'" />
<style:font-face style:name="David CLM" svg:font-family="'David CLM'" style:font-pitch="variable" />
Comment 26 QA Administrators 2018-10-21 02:50:36 UTC Comment hidden (obsolete)
Comment 27 Eyal Rozenberg 2018-10-21 13:52:28 UTC
Still seeing this with build:

ּVersion: 6.2.0.0.alpha0+
Build ID: ad6adb1bfadf49af3187a0bb3ceffbf355e9eed1
CPU threads: 4; OS: Linux 4.9; UI render: default; VCL: gtk2; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF, Branch:master, Time: 2018-09-29_02:45:20
Locale: en-US (en_IL); Calc: threaded
Comment 28 QA Administrators 2019-10-22 02:31:28 UTC Comment hidden (obsolete)
Comment 29 Eyal Rozenberg 2019-10-24 20:54:39 UTC
Bug still manifests with:

Version: 6.3.2.2
Build ID: 1:6.3.2-1
CPU threads: 4; OS: Linux 5.2; UI render: default; VCL: gtk3; 
Locale: he-IL (en_IL); UI-Language: en-US

And please don't let us wait another year on this bug with no action :-(
Comment 30 Eyal Rozenberg 2021-02-12 22:21:50 UTC
(In reply to Eyal Rozenberg from comment #29)
> Please don't let us wait another year on this bug with no action :-(

Bug still manifests with:

Version: 7.1.0.3
Build ID: f6099ecf3d29644b5008cc8f48f42f4a40986e4c
CPU threads: 4; OS: Linux 5.9; UI render: default; VCL: gtk3
Locale: he-IL (en_IL); UI: en-US
Comment 31 Heiko Tietze 2021-04-28 06:23:08 UTC Comment hidden (off-topic)
Comment 32 Eyal Rozenberg 2021-04-28 11:36:02 UTC
(In reply to Heiko Tietze from comment #31)
Here's how it stands with the latest nightly:

* In the Insert Special Character dialog, with font David CLM, the Dagesh/Mapiq (U+0x5BC) character _is_  shown as placed inside the base character (dotted circle and a point inside of it). OK (buggy behavior gone)
* If I choose to insert a Dagesh/Mapiq, the character at which it is still inserted is not affected, and indeed we see the Dagesh/Mapiq on the outside and to the left. BUG
* If I insert a Dagesh/Mapiq from the clipboard, by pasting, it is inserted on the inside of the character. OK (but there is still a discrepancy from insertion using the Special Character dialog)
* If I insert a Dagesh/Mapiq using the keyboard, pressing RightAlt+ד in Hebrew layout, it inserted on the inside of the character. OK (but there is still a discrepancy from insertion using the Special Character dialog)
* If I select the text including the Dagesh/Mapiq I inserted, and set the the font to David CLM or anything else, the Dagesh/Mapiq is _not_ doubled, and in fact takes its appropriate position inside the character. OK  (buggy behavior gone)

So, part of the issue is now gone, but something still tells LO that the inserted Dagesh/Mapiq is somehow not an integral part of the run of text with its preceding letter.
Comment 33 Heiko Tietze 2021-04-28 13:17:37 UTC Comment hidden (off-topic)
Comment 34 Eyal Rozenberg 2021-04-28 13:31:21 UTC Comment hidden (off-topic)
Comment 35 Heiko Tietze 2021-04-28 15:35:30 UTC Comment hidden (off-topic)