Bug 114184 - Replace non-single-quote apostrophes with Geresh when appropriate
Summary: Replace non-single-quote apostrophes with Geresh when appropriate
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
3.3.0 release
Hardware: All All
: low normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: RTL-Hebrew 114637
  Show dependency treegraph
 
Reported: 2017-11-30 19:03 UTC by Eyal Rozenberg
Modified: 2023-11-24 19:14 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Document facilitating reproduction and identificaiton of the bug (13.45 KB, application/vnd.oasis.opendocument.text)
2018-12-22 09:23 UTC, Eyal Rozenberg
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eyal Rozenberg 2017-11-30 19:03:52 UTC
Description:
In Hebrew, the Geresh character (׳) is used for abbreviations, unlike English which uses the period (.). 

Thus if in English, 123 Oak Street may be abbreviated as 123 Oak St., in Hebrew, 
רחוב האלון 123 becomes רח׳ האלון 123. The Geresh character is _not_ supposed to be used for in single-quotation marks (well, if you do your typesetting right, anyway). There's a separate character for that which is / is based on Mercha.

Both Geresh and Mercha have their doublings, i.e. Gershaim and Merchaot. Gershaim is used to denote acronyms, e.g. דרישת שלום is a ד״ש.

In practice, however, Hebrew keyboard lacks keys for Geresh and for Gershaim (and for Mercha and Merchaot), so people use apostrophe (') and double-quotes (") instead of Geresh/Mercha and Gershaim/Merchaot/bottom Merchaot respectively: רח' האלון 123 and ד"ש.

Now, the auto-correction feature sometimes (rarely? perhaps only when importing Word documents or from a previous LO version?) mistakes a single apostrophe character, at the end of an abbreviated word and before a space, a period, a comma and so on, for a single quotation mark, and replaces it with, well, I'm not sure exactly what - probably a stylized quotation mark for the non-CTL font. It should never do that.

Also, even if it doesn't make the replacement - it should actually replace the apostrophe with a Geresh character (or replace on condition that the Hebrew font has a glyph for it).

Steps to Reproduce:
1. Set your CTL font to a Hebrew font with a clear difference between a single (opening) quotation mark and a Geresh.
2. Enable single-quote autocorrect
3. Switch paragraph direction to Hebrew
4. Switch the input language to Hebrew
5. Type in רח' האלון 123.

Actual Results:  
The apostrophe either stays as-is (U+0x27), or is replaced with a single opening quotation mark (U+0x2019)

Expected Results:
The apostrophe is replaced with a Geresh (U+0x5F3)


Reproducible: Always


User Profile Reset: No



Additional Info:
See:
https://en.wikipedia.org/wiki/Gershayim
https://en.wikipedia.org/wiki/Hebrew_punctuation
https://en.wikipedia.org/wiki/Geresh
https://he.wikipedia.org/wiki/%D7%9E%D7%A8%D7%9B%D7%90%D7%95%D7%AA



User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
Comment 1 Lior Kaplan 2017-11-30 22:24:23 UTC
While this bug is a real one, I'm setting the priority to low as this doesn't prevent anyone from using LibO. Also, Hebrew has more important usability and language support issues.
Comment 2 QA Administrators 2018-12-22 03:53:00 UTC Comment hidden (obsolete)
Comment 3 Eyal Rozenberg 2018-12-22 09:23:47 UTC
Created attachment 147775 [details]
Document facilitating reproduction and identificaiton of the bug

The attached document has some fonts and directions set manually, and a region for you to reproduce the bug by typing רח' האלון 123 while single-quote autocorrect is on. It also has the ideally-expected and actually-expected results below the region intended for typing.
Comment 4 Eyal Rozenberg 2018-12-22 09:24:31 UTC
This issue still manifests with:

Version: 6.2.0.0.beta1
Build ID: d1b41307be3f8c19fe6f1938cf056e7ff1eb1d18
CPU threads: 4; OS: Linux 4.9; UI render: default; VCL: gtk3; 
Locale: en-US (en_IL); UI-Language: en-US
Calc: threaded

and will continue to manifest forever unless something is done about it, because it's a mis-feature rather than a bug.
Comment 5 QA Administrators 2020-12-22 03:43:02 UTC Comment hidden (obsolete)
Comment 6 Eyal Rozenberg 2020-12-22 22:34:43 UTC
Still a perfectly relevant bug with version 7.0.3.1 .
Comment 7 QA Administrators 2022-12-23 03:36:37 UTC Comment hidden (obsolete)
Comment 8 Eyal Rozenberg 2022-12-23 09:55:59 UTC
Bug still manifests with:

Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: ad387d5b984c6666906505d25685065f710ed55d
CPU threads: 4; OS: Linux 6.0; UI render: default; VCL: gtk3
Locale: en-IL (en_IL); UI: en-US

(and seeing a opening single quotation mark)
Comment 9 László Németh 2023-11-24 19:14:04 UTC
Likely it's possible to fix this using regex-like .* patterns in AutoCorrect replacements, see 
http://libreoffice.hu/pattern-matching-in-autocorrect/

and e.g. its usage for Greek to replace the word-ending sigma:

https://bugs.documentfoundation.org/show_bug.cgi?id=116387