Bug 107184 - Furigana (ruby) should give option to treat entire selection as one base text
Summary: Furigana (ruby) should give option to treat entire selection as one base text
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.3.3.2 release
Hardware: All All
: medium enhancement
Assignee: Jonathan Clark
URL: https://ask.libreoffice.org/t/asiatic...
Whiteboard: target:25.2.0
Keywords:
: 154724 157366 (view as bug list)
Depends on:
Blocks: Ruby CJK-Japanese
  Show dependency treegraph
 
Reported: 2017-04-15 15:19 UTC by y3kcjd5
Modified: 2024-10-09 14:29 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Demomstration document with instructions for reproducing the problem(s) (7.26 MB, application/vnd.oasis.opendocument.text)
2017-05-16 10:52 UTC, y3kcjd5
Details
Rendering in the pxrubrica way (10.97 KB, image/png)
2023-11-27 11:52 UTC, Ben.Engbers@Be-Logical.nl
Details
Screen dump (48.02 KB, image/png)
2024-10-09 14:27 UTC, Ben.Engbers@Be-Logical.nl
Details

Note You need to log in before you can comment on or make changes to this bug.
Description y3kcjd5 2017-04-15 15:19:53 UTC
Description:
As far as I can tell, the furigana menu (Alt,O,I,Enter) always automatically selects and divvies up the characters for which furigana are produced. This can be controlled to some extent by selecting characters, but causes a lot of problems if the automatic behavior isn't what's desired:
a. AFAIK it's impossible to insert a set of new characters (with furigana) in existing text; even if the cursor is placed between characters (instead of selecting some), the system automatically enters surrounding text into the furigana menu for editing.
b. It is impossible to control how a selection is split up into editing entries. Sometimes Japanese text will contain somewhat unorthodox combinations of, say, 3~4 kanji that the system decides to split up; if this happens, trying to put furigana over all of them together as one set is a massive pain (and involves much repeated pressing of buttons and copious use of Backspace).

Steps to Reproduce:
1. Produce a document with aforementioned uncommon combinations of characters; something like "叩五月雨" or "二連想" are good examples
2. Try to accomplish a. above without deleting desired characters or b. above without having to get rid of tons of extraneous output

Actual Results:  
for a., stuff gets deleted. For b., creating the furigana fails the first few times producing lots of extraneous kanji instead.

Expected Results:
For a., just don't automatically select surrounding text, and for b., always treat contiguous selections as complete editing entries (Note: it is almost never the case that multiple strings requiring furigana appear immediately adjacent each other). That way I could actually use Ctrl-select to fix multiple entries at once!


Reproducible: Always

User Profile Reset: No

Additional Info:
I'm making a couple other requests/reports regarding the furigana system; hopefully they can get cleaned up together.


User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:51.0) Gecko/20100101 Firefox/51.0
Comment 1 JO3EMC 2017-05-05 12:03:37 UTC
It seems to be a little strange situation for the "Ruby" or "Asian Phonetic Guide" function in LibreOffice, surely.
It might get worse than previous version.
I also feel some problem on this matter.

But this report is too complex to interpret for me.

First.
This report includes multiple pbobrems, isn't it?
In your expression, "a." and "b.".
It maybe better to split them to different report, I think.
For easier handling.

Second.
"Steps to Reproduce" seems to be lack of explanation.
Now, I can't reproduce your probrem.
"Step" is wanted to be so simple, subdevided.

If you'd like, we may have some communication in Japanese with e-mail etc...
You can also participate to Mailing-list of Japanese Community.
<discuss@ja.libreoffice.org>
Comment 2 y3kcjd5 2017-05-16 10:52:43 UTC
Created attachment 133357 [details]
Demomstration document with instructions for reproducing the problem(s)

This report concerns only one problem: the furigana menu automatically selects and divides text into the base text fields; it shouldn't. However, it manifests as different symptoms in different situations (A and B).

I've created and attached a demo document with step-by-step instructions and screenshots to better describe the symptoms.

I'm not familiar with mailing lists, is there some way to sign up without actually sending an email to it?
Comment 3 JO3EMC 2017-06-04 15:40:56 UTC
Sorry for the late reply.

I saw your demo document.
OK. Generally I understood. I would like to confirm this report.
Actually, I am also suffering from these problems.

However, as I told.
This report seems to include multiple phenomena that are strongly related, but the causes are not necessarily identical, I think.
I think that the contents of the demo document can be broken down into at least 4 events.

(1) Undesirable division of Base text, when the function of "Asian Phonetic Guide" is called with some characters selected in advance.

(2) When we modify the division of the base text in the "Asian Phonetic Guide" dialog box, the original string grows illegally.
    (It seems to be reproduced when there are multiple lines of Base text.)

(3) When we call the function of "Asian Phonetic Guide" with no character selection, the next character after cursor position is set to Base text, unexpectedly.

(4) When we move the Base text of one line to another line in "Asian Phonetic Guide" dialog box, that character disappears in the body.
    (Didn't reproduced in my environment. Blank lines are locked, and I can't input any characters.)

As you say, (1) and (3), (2) and (4) may be the same problem respectively.
But I do not know the truth.


I don't know the way to sign up to mailing lists without sending an e-mail.
Is there any problem to send an e-mail?
Since it is a "mailing list", I think that it is natural that registration acceptance is by e-mail.
The way to register is explained on the following page.
https://wiki.documentfoundation.org/JA/WhatsMailingList
There are several MLs, such as bug fixes are treated in "discuss".

Or for the time being, you can send me a mail directly.
The address is linked to the handle name.

Version: 5.3.3.2
Build ID: 3d9a8b4b4e538a85e0782bd6c2d430bafe583448
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; Layout Engine: new; 
Locale: ja-JP (ja_JP); Calc: group
Comment 4 QA Administrators 2019-11-16 03:41:31 UTC Comment hidden (obsolete)
Comment 5 y3kcjd5 2020-03-10 02:46:53 UTC
Reproduced in version  6.4.1.2 (x64), Build ID: 4d224e95b98b138af42a64d84056446d09082932.

The demo document is slightly outdated; in example B, the "で" can no longer be moved out of the way as shown and must instead be replaced, because all entry fields unoccupied when the furigana window is opened are no longer editable. However, the behavior being shown is still present, and the end result (incorrect deletion of "で" due to its erroneous presence in the furigana window) is the same.
Comment 6 QA Administrators 2022-08-29 06:42:42 UTC Comment hidden (obsolete)
Comment 7 Ben.Engbers@Be-Logical.nl 2023-11-27 11:51:51 UTC
After I closed my bug (https://bugs.documentfoundation.org/show_bug.cgi?id=157366) I encountered the same problem in Latex.
In the texlive package pxrubrica this was solved in a nice way. Essentially, placing a `|` determines how the Ruby blocks are placed above the Kanji.
The word 自動販売機 is split into 2 Kanji blocks in the Asiatic Phonetic guide. 
The first block contains the Kanji 自動 with hiragana じどう. The second block contains the Kanji 販売機 with hiragana はんばいき.
In the example, this corresponds to the top view....
In pxrubrica, by placing a | you can change the mode of rendering. 
\ruby[j]{自動販売機}{じ|どう|はん|ばい|き} results in the bottom rendering.

Is it a suggestion to implement something similar in the Phonetic guide as well? Use of a | should then result in splitting the Ruby block.
Comment 8 Ben.Engbers@Be-Logical.nl 2023-11-27 11:52:37 UTC
Created attachment 191060 [details]
Rendering in the pxrubrica way
Comment 9 Stéphane Guillou (stragu) 2024-05-09 01:40:35 UTC
*** Bug 157366 has been marked as a duplicate of this bug. ***
Comment 10 Stéphane Guillou (stragu) 2024-05-09 01:48:32 UTC
*** Bug 154724 has been marked as a duplicate of this bug. ***
Comment 11 Seán Ó Séaghdha 2024-05-09 02:58:40 UTC
While the info on Latex is interesting, it’s not really useful for a GUI program like LO.  What happens when the delimiter is part of the text?

The simplest solution would be for LO to do NO second-guessing about what is a word or what isn’t a word.  You can’t know all possible user contexts.  The selection should be accepted as a word or sequence that you want to add one set of ruby text to.

As it is, the dialogue is very useful if you’re adding phonetics to continuous text in a language that uses spaces (or maybe where the syllable is predictable), but this feature is most used for Japanese where it’s used more for the occasional unusual or lesser-know pronunciation, not for long stretches of text.

The educational use for Chinese might still do that though.

Maybe it actually needs two modes?
Comment 12 Jonathan Clark 2024-09-09 17:26:46 UTC
From reading the above, and following the instructions in the example document, I believe most of the identified issues were due to bug 141466. Editing base text is much less painful now than it used to be. Given that, I think it would be helpful to update this bug to reflect the current state of Writer.


Currently, Writer automatically splits Japanese text into multiple base text runs using a dictionary. This can be helpful in some cases, but can require excessive base text editing in situations where the user generally wants to treat entire selections as a single run of base text. Writer should handle this use case better.

Note that this is closely related to the mono case (bug 156543, bug 113189). It's possible that both cases can be addressed simultaneously.
Comment 13 Seán Ó Séaghdha 2024-09-10 06:05:17 UTC
A mode (mono) switch would probably go a long way towards a solution.

The problem is how the non-mono mode would work.  What’s really needed is a dialog that allows the user to select a whole sentence and arrange it as necessary, grouping or ungrouping characters/words as they see fit, in a clear and straightforward way.

Asian language users certainly *can* do everything now, but going in an out of the dialog many times is very tedious and slow.

The way I use it is undoubtedly a minority use: adding phonetics to non-Asian languages.  This can be extremely painful at times due to LO’s assumptions about words.  For example it’s common practice to hyphenate Gaulish or Ogham words to highlight their components, but even if you use a non-breaking hyphen, LO will assume that all the parts are separate words, e.g. CUNA‑MAGLI → CUNA | ‑ | MAGLI.  Attempting to edit the text in the dialog corrupts the document text, so there are only non-obvious solutions (select a single word & edit it later, select nothing and type the word in the dialog).  See bug 154724 for a worse case with a Proto-Indo-European word like *h₁ek̑u̯os.

It would all be a lot more understandable if LO didn’t second guess what you were trying to do.  Even its second guessing is wrong though, as a non-breaking hyphen should not break a word.
Comment 14 Jonathan Clark 2024-09-10 12:34:59 UTC
(In reply to Seán Ó Séaghdha from comment #13)
> For example it’s common practice to hyphenate Gaulish or Ogham words to
> highlight their components, but even if you use a non-breaking hyphen,
> LO will assume that all the parts are separate words, e.g.
> CUNA‑MAGLI → CUNA | ‑ | MAGLI. Attempting to edit the text in the dialog
> corrupts the document text, so there are only non-obvious solutions

Fortunately, document text corruption shouldn't happen anymore now that bug 141466 is fixed. It will be safe to edit base text in versions from 24.8.1 onward.

If most of the remaining inconvenience can be resolved by adding a mode switch/button to treat the entire selection as a single run, I'm inclined to start with that change. We can revisit base text grouping in the future, if needed.
Comment 15 Commit Notification 2024-09-12 03:05:00 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/3d9b8701cb1751e4139ffa24f72bb836eb877fd1

tdf#107184 sw: Added base text group feature to Asian Phonetic Guide

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Ben.Engbers@Be-Logical.nl 2024-09-16 12:32:27 UTC
At first glance, this looks like a good improvement. After I have more experience here I will let you know if necessary.

Ben
Comment 17 Ben.Engbers@Be-Logical.nl 2024-10-09 14:27:40 UTC
Created attachment 196987 [details]
Screen dump
Comment 18 Ben.Engbers@Be-Logical.nl 2024-10-09 14:29:05 UTC
I have used the improved version for some time now. In furigana, the placement of the hiragana characters compared to the Kanji has improved considerably. However, I still see some areas for improvement.
1).
A sequence of Kanji is usually well divided into separate pieces that also take into account the difference between onyomi and kunyomi. However, a word like '卯の花' is not divided into 3 pieces (as should be).
2).
Splitting the sequence of kanji goes reasonably well until one of the yakumono punctuation marks is found. After such a punctuation mark, the sequence is not split further.

Ben