166537 – Asian Phonetic Guide (Ruby) Group alters base text (moves spaces)

Bug 166537 - Asian Phonetic Guide (Ruby) Group alters base text (moves spaces)

Summary: Asian Phonetic Guide (Ruby) Group alters base text (moves spaces)

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	25.2.2.2 release
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	Ruby
	Show dependency tree / graph

Reported:	2025-05-12 05:56 UTC by Seán Ó Séaghdha
Modified:	2025-12-17 06:10 UTC (History)
CC List:	3 users (show)

See Also:	169791
Crash report or crash signature:

Attachments
Shows examples of text before and after adding Ruby text (71.75 KB, image/png) 2025-05-12 05:56 UTC, Seán Ó Séaghdha	Details
Before adding ruby (120.94 KB, image/jpeg) 2025-09-11 13:20 UTC, Seán Ó Séaghdha	Details
Using Group moves spaces to the end of the string (145.44 KB, image/jpeg) 2025-09-11 13:20 UTC, Seán Ó Séaghdha	Details
Resulting mangled text (133.59 KB, image/jpeg) 2025-09-11 13:21 UTC, Seán Ó Séaghdha	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Seán Ó Séaghdha 2025-05-12 05:56:18 UTC

Created attachment 200751 [details]
Shows examples of text before and after adding Ruby text

Using the new Group feature in the Asian Phonetic Guide dialog is destructive of text that uses spaces. When you use the Group button and add phonetic text, all the spaces in the base text are moved to the end of the selected text.

Comment 1 Seán Ó Séaghdha 2025-05-30 04:09:42 UTC

This is also true for (true) ’ apostrophes. They don’t appear in the dialog and they are shifted to the end of the text when you use grouping.

Every verb in Irish that starts with a vowel will have a past tense starting in d’ and these are usually transcribed phonetically as one sound, so this bug makes doing phonetic mark-up extremely tedious.

This doesn’t happen with ' though, which appears in the dialog and isn’t shifted. These symbols are semantically identical and should not be treated differently.

Comment 2 Seán Ó Séaghdha 2025-09-11 13:20:15 UTC

Created attachment 202795 [details]
Before adding ruby

Comment 3 Seán Ó Séaghdha 2025-09-11 13:20:52 UTC

Created attachment 202796 [details]
Using Group moves spaces to the end of the string

Comment 4 Seán Ó Séaghdha 2025-09-11 13:21:19 UTC

Created attachment 202797 [details]
Resulting mangled text

Comment 5 Seán Ó Séaghdha 2025-09-11 13:22:13 UTC

I see this is still a problem in v. 25.8.0.4

Comment 6 Takenori Yasuda 2025-12-02 14:16:46 UTC

Reproduced.
Version: 26.2.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 620(Build:0)
CPU threads: 8; OS: Windows 11 X86_64 (build 26100); UI render: Skia/Raster; VCL: win
Locale: ja-JP (ja_JP); UI: ja-JP
Calc: CL threaded Jumbo

Comment 7 Jonathan Clark 2025-12-09 14:16:13 UTC

There are two aspects to this report:

The first aspect is buggy Group behavior when a selection spans characters that are ignored, like spaces and punctuation. This could be fixed by not grouping non-contiguous base text, at least in the CJ case.

The second aspect is better behavior when annotating non-CJ text.

There were no tests, and the code didn't work properly when I started, so it's made for a bit of guesswork about how the APG was originally meant to behave. Currently, the code allows for large selections, and only presents the word-like, non-punctuation parts for annotation. In order to make APG work better for western text, we would have to change that behavior to include punctuation and spaces for annotation.

I would be interested to hear opinions from other users about this. When I use the APG, I find it most natural to select a single word or compound word at a time for annotation, so I wouldn't consider this change an issue. However, it might have a large impact on others.

Comment 8 Takenori Yasuda 2025-12-10 07:32:47 UTC

As a Japanese individual, I would like to share my perspective.

The usage pattern described in this report is unexpected for me.

When we apply ruby annotations in Japanese, we always select a single unit such as one character, one word, or a compound word. We never annotate across spaces.
This applies equally when adding readings such as pinyin, zhuyin, or phonetic guides to foreign-language text. For languages that use spaced writing, it is even more natural to select and annotate one word at a time.

For this reason, I find the reported use case genuinely hard to understand.


However, I do believe that apostrophes should be treated as part of the base text for ruby annotation.

In Japan, it is common to add phonetic readings in kana above English words for beginners. Under the current behavior, it becomes impossible to annotate words such as "I'm" with "アイム". The same is true for "don't" and "o'clock", etc.
If contractions cannot be annotated, it becomes a serious inconvenience when preparing language-learning materials.

Therefore, I think that proper handling of apostrophes has clear value.

Comment 9 Seán Ó Séaghdha 2025-12-10 11:22:29 UTC

It’s not unusual for phonetic (IPA) markup to cross word boundaries, since spaces don’t have any direct correlation with speech.

One of my examples also uses Ruby text for translation, which cannot be applied at the word level.

I think the main issue is that if the system suggests that you can do these things, regardless of whether the feature is intended to work that way, it should at least not be destructive of the underlying text.

I get the impression spaces are discarded before the dialog is displayed, but leaving that aside, what problem would retaining spaces in “Grouped” text cause?

Would Japanese or Chinese text selected for Ruby be likely to contain spaces?

Comment 10 Takenori Yasuda 2025-12-11 03:00:36 UTC

(In reply to Seán Ó Séaghdha from comment #9)
> Would Japanese or Chinese text selected for Ruby be likely to contain spaces?
No. Japanese and Chinese normally do not use spaced writing, so ordinary text contains no spaces.


> what problem would retaining spaces in “Grouped” text cause?
For the usage described in the report, I honestly cannot predict the impact. It is simply outside the Japanese ruby workflow, so I cannot infer how such usage would behave.

For the intended usage, layout adjustment may visually widen spacing, but no actual whitespace characters are inserted.

Comment 11 Takenori Yasuda 2025-12-12 10:42:36 UTC

Here is a brief explanation of what "grouping" means in Japanese ruby annotation.

For example, take the Japanese word "東京" (Tokyo).
If we annotate it with ruby, it becomes:
- 東 (とう)
- 京 (きょう)

Tokyo consists of two kanji characters, "東" and "京", each with its own individual reading.
This type of annotation—assigning a reading to each character—is called "per-character ruby" (often called "mono-ruby"). This corresponds to "Mono" in LibreOffice Writer.

Now consider the name "飛鳥" (Asuka).
Its reading is added like this:
- 飛鳥 (あすか)

The reading "あすか" belongs to the word as a whole. Neither "飛" nor "鳥" individually maps to any part of that reading. So you cannot split "飛鳥" into separate ruby annotations.
This is what we call "group ruby", where the ruby applies to the entire word as one block. This corresponds to "Group" in LibreOffice Writer.

Note:
Even when a word could take per-character ruby, it is still possible to apply group ruby instead, depending on the desired layout or reading style.

Comment 12 Takenori Yasuda 2025-12-17 06:10:03 UTC

Here is the documentation written by W3C regarding Japanese and Chinese ruby text.

Requirements for Japanese Text Layout (日本語組版処理の要件)
- 3.3 Ruby and Emphasis dots (https://www.w3.org/TR/jlreq/#ruby_and_emphasis_dots)

Requirements for Chinese Text Layout (中文排版需求)
- 5.5 Inline notes & annotations (https://www.w3.org/TR/clreq/#h_inline_notes)