Bug 136368 - Unfortunately "t" and "y" incorrectly being interpreted as a "ty" when transliterating into Old Hungarian
Summary: Unfortunately "t" and "y" incorrectly being interpreted as a "ty" when transl...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.0.0.3 release
Hardware: All All
: medium normal
Assignee: László Németh
URL:
Whiteboard: target:7.2.0 target:7.1.1 target:7.4....
Keywords:
Depends on:
Blocks:
 
Reported: 2020-09-01 13:02 UTC by Óvári
Modified: 2022-01-03 14:59 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Image showing characters (24.80 KB, image/png)
2020-09-01 13:07 UTC, Óvári
Details
An example of an Old Hungarian number. (23.14 KB, image/jpeg)
2022-01-03 14:56 UTC, Kovács "kiwi" Viktor
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Óvári 2020-09-01 13:02:18 UTC
Description:
Unfortunately "t" and "y" incorrectly being interpreted as a "ty" when transliterating into Old Hungarian

Steps to Reproduce:
Thank you for the "Transliteration to Old Hungarian" feature which arrived in LibreOffice 7.0
https://wiki.documentfoundation.org/ReleaseNotes/7.0#Transliteration_to_Old_Hungarian

Using the template at https://wiki.documentfoundation.org/images/8/8f/Sz%C3%A9kely_%C3%ADr%C3%A1s_sablondokumentum_be%C3%A1gyazott_Noto_bet%C5%B1k%C3%A9szlettel.ott there are some errors.

We have tried it out with the following words:

1. Q=KV pl. Aquincum 𐲀𐳓𐳮𐳐𐳙𐳄𐳪𐳘
2. X=KSZ pl. taxi 𐳦𐳀𐳓𐳥𐳐
3. Y=I pl. Vörösmarty 𐲮𐳞𐳢𐳞𐳤𐳘𐳀𐳢𐳨
4. W=V, pl. Wesselényi 𐲮𐳉𐳤𐳤𐳉𐳖𐳋𐳚𐳐

The 3rd output seems incorrect as 𐲮𐳞𐳢𐳞𐳤𐳘𐳀𐳢𐳨 should be 𐲮𐳞𐳢𐳞𐳤𐳘𐳀𐳢𐳦𐳐?

This bug issue will not show the Old Hungarian characters your computer if you have not installed them so an image will be attached to this issue.

Thank you once again

Actual Results:
Wesselényi is being transliterated to 𐲮𐳞𐳢𐳞𐳤𐳘𐳀𐳢𐳨

Expected Results:
Wesselényi should be transliterated to 𐲮𐳞𐳢𐳞𐳤𐳘𐳀𐳢𐳦𐳐


Reproducible: Always


User Profile Reset: No



Additional Info:
Response from Németh László

This is wrong, indeed. I started to add some exceptions to the Numbertext dictionary, see

https://github.com/Numbertext/libnumbertext/blob/master/data/hu_Hung.sor

but the real solution will be the planned replacement/combination of the pattern-based
hyphenation to the dictionary+pattern based one.

The recent Hungarian spelling dictionary has already had the required data:

$ hunspell -d hu_HU -m
Vörösmarty
Vörösmarty  st:Vörösmarty po:noun_prs ts:NOM hy:Vö-rös|mar-t.y

Here the dot between t.y means that this is not the Hungarian letter "ty", but a "t" and "y" (spelled out as "i").
Comment 1 Óvári 2020-09-01 13:07:39 UTC
Created attachment 164965 [details]
Image showing characters
Comment 2 Kovács "kiwi" Viktor 2020-09-12 07:41:08 UTC
@László Németh
I think it's your case.
Comment 3 Kovács "kiwi" Viktor 2020-09-12 10:26:41 UTC
@László Németh
@Óvári
Sorry, I didn't look that it is already assigned. I just look the CC list
Comment 4 Commit Notification 2021-02-01 08:50:37 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/e6165b7cac5d91458d61da3de35486cde3004897

tdf#136368 bump to libnumbertext 1.0.7

It will be available in 7.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 5 Óvári 2021-02-01 09:38:22 UTC
Are the any other "t" + "y" words that need fixing or should this issue be closed? Thank you
Comment 6 Commit Notification 2021-02-01 11:52:07 UTC
László Németh committed a patch related to this issue.
It has been pushed to "libreoffice-7-1":

https://git.libreoffice.org/core/commit/130445d231dc0c8af9148edd234f16424d0a16aa

tdf#136368 bump to libnumbertext 1.0.7

It will be available in 7.1.1.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 7 László Németh 2021-02-02 11:52:42 UTC
(In reply to Óvári from comment #5)
> Are the any other "t" + "y" words that need fixing or should this issue be
> closed? Thank you

Common and less common exceptions are handled correctly now:

ty: Feszty, Haraszty, Huszty, Mindszenty, Noszty, Pesty, Vörösmarty, city, zloty...

ly: Áprily, Dolly, Ély, Hollywood, grizzly, Kéthly, Reguly, Thököly...

ny: Hatvany, Sony, penny


So we can close the issue. Thanks for reporting the problem!
Comment 8 Óvári 2021-02-02 20:51:04 UTC
Thanks for fixing.

1. Is "Batthyány" on the list?
https://en.wikipedia.org/wiki/Batthy%C3%A1ny

2. Do you know of an Internet list of these exceptions?

Thanks you once again.
Comment 9 László Németh 2021-02-05 08:32:47 UTC
I have reopened. I am going to fix it on dictionary-level to support the future extensions and more exceptions.

@Óvári: thanks for the report!
Comment 10 László Németh 2021-02-05 09:05:23 UTC
@Óvári: You are right, “Batthány” is still missing. I am going to add it soon.

I have already added the following ~100 exceptions to the Hungarian spelling dictionary, which will be the base of further refinement of the transliteration:

Adriany, Áprily, Árokháty, assembly, Balatony, Belatiny, Blattny, Boháty, Boroviczény, Bölöny, Brooklyn, Champs-Élysées, city, Csernátony, Delly, Dicenty, Dolák-Saly, Dolly, Duronelly, ecstasy, Édeskuty, Élysée-palota, Fabiny, Feszty, Finály, Folly, grizzly, Haraszty, Hatvany, Hefty, Huszty, Illy, intercity, Istvány, jolly, Jóny, Kacziány, Kamilly, Kelety, Kereszty, Kerny, Kertbeny, Kéthly, Kétly, Kismarty, Kmety, Kmetty, Kukorelly, Lacsny, Lyme-kór, lymphocyta, Lyon, Mindszenty, Noszty, Novotny, Olty, Patrubány, Peéty, penny, Peregriny, Perity, Pesty, Pewny, Plymouth, Povolny, Purgly, Reguly, Rezsny, Rosty, royalty, Saágy, Saly, Schrotty, Schwotty, Serly, Sexty, Sony, Spergely, Splény, Stáhly, Szentkuty, Szentmártony, Szily, Szombaty, Sztevanovity, Thaly, Thököly, Veszely, Vizkelety, Vízkelety, Vlaszaty, Volny, Vörösmarty, Wény, Wessely, Weszely, Wolny, złoty, Zsivny
Comment 11 Óvári 2021-02-05 22:01:26 UTC
@Németh: Does "Horty" work correctly?
https://hu.wikipedia.org/wiki/Soltszentimre

Thank you once again
Comment 12 Kovács "kiwi" Viktor 2021-02-06 11:24:21 UTC
(In reply to Óvári from comment #11)
> @Németh: Does "Horty" work correctly?
> https://hu.wikipedia.org/wiki/Soltszentimre
> 
> Thank you once again

The wikipedia editors made a mistake:
Horthy Miklós is the correct.
Comment 13 Commit Notification 2022-01-03 08:05:01 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/98fd4fcdc61202846e0957cb6aaed9e4a2d2c520

tdf#136368 bump to libnumbertext 1.0.8

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 László Németh 2022-01-03 08:11:07 UTC
Fixed in master, and in 7.3 soon with libnumbertext 1.0.8. Changes:

- fix transliteration of old Hungarian family names, bug report by Zoltán Óvári

- fix transliteration of numbers 100–199, 1000–1999, 1000000–1999999 and 1000000000–1999999999 (bad ordering)

- fix conversion of single letters "í", "Í" and "NY";

- fix unnecessary conversion of words ending with "q", e.g. "IQ";

- fix unnecessary conversion of words not ending with unknown letters
Comment 15 Commit Notification 2022-01-03 12:51:34 UTC
László Németh committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/118b6dddad55e00b1ae596db344c6672a1d4d4c3

tdf#136368 bump to libnumbertext 1.0.8

It will be available in 7.3.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Kovács "kiwi" Viktor 2022-01-03 14:44:07 UTC
(In reply to László Németh from comment #10)
> @Óvári: You are right, “Batthány” is still missing. I am going to add it
> soon.
> 
> I have already added the following ~100 exceptions to the Hungarian spelling
> dictionary, which will be the base of further refinement of the
> transliteration:
> 
> Adriany, Áprily, Árokháty, assembly, Balatony, Belatiny, Blattny, Boháty,
> Boroviczény, Bölöny, Brooklyn, Champs-Élysées, city, Csernátony, Delly,
> Dicenty, Dolák-Saly, Dolly, Duronelly, ecstasy, Édeskuty, Élysée-palota,
> Fabiny, Feszty, Finály, Folly, grizzly, Haraszty, Hatvany, Hefty, Huszty,
> Illy, intercity, Istvány, jolly, Jóny, Kacziány, Kamilly, Kelety, Kereszty,
> Kerny, Kertbeny, Kéthly, Kétly, Kismarty, Kmety, Kmetty, Kukorelly, Lacsny,
> Lyme-kór, lymphocyta, Lyon, Mindszenty, Noszty, Novotny, Olty, Patrubány,
> Peéty, penny, Peregriny, Perity, Pesty, Pewny, Plymouth, Povolny, Purgly,
> Reguly, Rezsny, Rosty, royalty, Saágy, Saly, Schrotty, Schwotty, Serly,
> Sexty, Sony, Spergely, Splény, Stáhly, Szentkuty, Szentmártony, Szily,
> Szombaty, Sztevanovity, Thaly, Thököly, Veszely, Vizkelety, Vízkelety,
> Vlaszaty, Volny, Vörösmarty, Wény, Wessely, Weszely, Wolny, złoty, Zsivny

@Németh: Several of these names and words still appear incorrectly.
Remark: Sztevanovity is Sz-t-e-v-a-n-o-v-i-ty. It is the name of a musician whose paternal branch has Serbian ancestors. The truth is that we should correctly name Jelačić as "Jelacsity." That's Hungarianization, that we call him as "Jelasics"
Comment 17 Kovács "kiwi" Viktor 2022-01-03 14:56:57 UTC
Created attachment 177286 [details]
An example of an Old Hungarian number.

@László Német This number is readable in the "Arany János összes költeménye - All the poems of János Arany" e-book's page 2nd, downloadable from:

https://mek.oszk.hu/05600/05694/05694.pdf
Comment 18 Kovács "kiwi" Viktor 2022-01-03 14:59:55 UTC
@László Németh: Sorry about spelling mistake of your surname in the previous comment.