Bug 147546 - Transliteration with punctuation marks
Summary: Transliteration with punctuation marks
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.3.0.3 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:7.4.0 target:7.3.3
Keywords: bibisectRequest, regression
Depends on:
Blocks:
 
Reported: 2022-02-20 02:31 UTC by Óvári
Modified: 2022-04-19 10:31 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
LibreOffice 7.3.1.1 transliteration (97.96 KB, image/png)
2022-02-20 02:32 UTC, Óvári
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Óvári 2022-02-20 02:31:03 UTC
Description:
The “Old Hungarian template document with embedded Noto Old Hungarian font”[1] (from LibreOffice 7.0 Release Notes[2]) no longer transliterates punctuation marks (i.e. commas, question marks) as shown in the “Screencast demonstration”[3] with Linux Mint 20.3 Cinnamon and LibreOffice[4].

[1] Old Hungarian template document with embedded Noto Old Hungarian font 
https://wiki.documentfoundation.org/images/8/8f/Sz%C3%A9kely_%C3%ADr%C3%A1s_sablondokumentum_be%C3%A1gyazott_Noto_bet%C5%B1k%C3%A9szlettel.ott

[2] Transliteration to Old Hungarian, LibreOffice 7.0 Release Notes
https://wiki.documentfoundation.org/ReleaseNotes/7.0#Transliteration_to_Old_Hungarian

[3] Screencast demonstration
https://www.youtube.com/watch?v=w8BRhN5MZbs

[4] LibreOffice
Version: 7.3.1.1 / LibreOffice Community
Build ID: 349cd3ad57dce98d6b54b76f8e9f456ac7d7edb7
CPU threads: 2; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Steps to Reproduce:
```
Székely írás 2020-ban? Egyszerű, nagyszerű!
Meggyőző (g-gy), meggyízű (gy-gy).
Vége
```

Are you able to reproduce? And please fix?

Should the following be added to this issue:
1. Add a unit test
2. Regression label

Thank you

Actual Results:
```
𐲥𐳋𐳓𐳉𐳗 𐳑𐳢𐳁𐳤 𐳺𐳺𐳿𐳼𐳼-𐳂𐳀𐳙? 𐲉𐳎𐳥𐳉𐳢𐳬, 𐳙𐳀𐳎𐳥𐳉𐳢𐳬!
𐲘𐳉𐳍𐳎𐳟𐳯𐳟 (g-gy), 𐳘𐳉𐳎𐳎𐳑𐳯𐳬 (gy-gy).
𐲮𐳋𐳍𐳉 
```

Will add a screenshot of the text above for people who don't have the font installed.

Expected Results:
[3] Screencast demonstration
https://www.youtube.com/watch?v=w8BRhN5MZbs


Reproducible: Always


User Profile Reset: No



Additional Info:
Thank you
Comment 1 Óvári 2022-02-20 02:32:27 UTC
Created attachment 178399 [details]
LibreOffice 7.3.1.1 transliteration

Actual result with LibreOffice 7.3.1.1

```
𐲥𐳋𐳓𐳉𐳗 𐳑𐳢𐳁𐳤 𐳺𐳺𐳿𐳼𐳼-𐳂𐳀𐳙? 𐲉𐳎𐳥𐳉𐳢𐳬, 𐳙𐳀𐳎𐳥𐳉𐳢𐳬!
𐲘𐳉𐳍𐳎𐳟𐳯𐳟 (g-gy), 𐳘𐳉𐳎𐳎𐳑𐳯𐳬 (gy-gy).
𐲮𐳋𐳍𐳉 
```
Comment 2 Óvári 2022-02-20 02:51:35 UTC
Regressions:
1. `?` is not transliterated
2. `,` is not transliterated
3. text in `(…)` is not transliterated

Not sure if these are transliterated or not:
4. `!`
5. `.`
6. `(` and ')'

Thank you
Comment 3 László Németh 2022-03-02 10:03:37 UTC
@Óvári: Thanks for the bug report. Unfortunately, this is a regression caused by the too strong limitation of the transliteration and lack of the unit tests. I'm going to fix it with the appropriate unit tests soon.
Comment 4 László Németh 2022-03-11 13:41:24 UTC
Regression from commit 98fd4fcdc61202846e0957cb6aaed9e4a2d2c520
"tdf#136368 bump to libnumbertext 1.0.8".

The proposed fix is there in the Numbertext repository:

https://github.com/Numbertext/libnumbertext/commit/ecc9dd482b9cc00f8b576008d8323560e3c70d80
Comment 5 Commit Notification 2022-04-04 01:04:11 UTC
László Németh committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d925d1ca9e03863650dd3e450331598624f21724

tdf#147546 bump libnumbertext to 1.0.10

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 6 Óvári 2022-04-04 06:42:50 UTC
Thank you for the fix for the quotation mark:
1. „…”

Does this also handle the other quotation marks:
2. »…«
3. ’…’

„Quote »inside ’inside of inside’ inside« quote”

https://en.wikipedia.org/wiki/Quotation_mark#Hungarian

Thank you
Comment 7 László Németh 2022-04-04 08:04:37 UTC
(In reply to Óvári from comment #6)
> Thank you for the fix for the quotation mark:
> 1. „…”

Thanks for your bug report!

> 
> Does this also handle the other quotation marks:
> 2. »…«
> 3. ’…’
> 
> „Quote »inside ’inside of inside’ inside« quote”
> 
> https://en.wikipedia.org/wiki/Quotation_mark#Hungarian
> 
> Thank you

2nd level quotation is mirrored, so we don't need to transliterate them (otherwise we could got the French-like «text»).

3rd level quotation is used only scientific articles, not a common thing. But transliteration of the apostrophe (the same character) seems to be useful. In fact, the apostrophe is more frequent in foreign brands, e.g. McDonald’s. In old Hungarian literature, abbreviation of words, e.g. „Pista bá’”, but it seems, now it’s not used any more:

https://e-nyelv.hu/2015-03-15/pista-ba/

Likely it's worth to file a new issue for the apostrophe (solving the 3rd level quotation marks with it, too).

Thanks,
László
Comment 8 Commit Notification 2022-04-04 17:08:03 UTC
László Németh committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/09dfe214a30f58ddcd7a857db8f5eee68d4cef2a

tdf#147546 bump libnumbertext to 1.0.10

It will be available in 7.3.3.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 9 Óvári 2022-04-19 06:27:45 UTC
Tested the “Old Hungarian template document with embedded Noto Old Hungarian font”[1] with characters in the parentheses, i.e. (…), does LibreOffice unfortunately does not transliterate the characters in the parentheses. Tested the following:

```
(g-gy)
(gy-gy)
```

1. Can characters in the parentheses please be transliterated?
2. Does a unit test need to be added for text in parentheses?
3. Should a unit test with the following string be added:
```
Székely írás 2020-ban? Egyszerű, nagyszerű!
Meggyőző (g-gy), meggyízű (gy-gy).
Vége
```

What do you think?

Thank you

Version: 7.3.3.1 / LibreOffice Community
Build ID: 1688991ca59a3ca1c74bc2176b274fba1b034928
CPU threads: 2; OS: Linux 5.4; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

Linux Mint 20.3 Cinnamon

[1] Old Hungarian template document with embedded Noto Old Hungarian font 
https://wiki.documentfoundation.org/images/8/8f/Sz%C3%A9kely_%C3%ADr%C3%A1s_sablondokumentum_be%C3%A1gyazott_Noto_bet%C5%B1k%C3%A9szlettel.ott
Comment 10 Óvári 2022-04-19 10:31:59 UTC
Closing as new issue created: Bug 148672