Bug 139210 - Diacritical mark handling in Writer (may apply to other components)
Summary: Diacritical mark handling in Writer (may apply to other components)
Status: RESOLVED NOTABUG
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.0.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
 
Reported: 2020-12-24 15:13 UTC by ajlittoz
Modified: 2022-05-01 15:15 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
Exampled of combined letters (8.84 KB, application/vnd.oasis.opendocument.text)
2020-12-24 15:13 UTC, ajlittoz
Details
LO master/7.2 hybrid PDF of Writer doc showing differences between U+0305 and U+035e, "connect left or right" or default (60.76 KB, application/pdf)
2021-01-05 15:11 UTC, V Stuart Foote
Details

Note You need to log in before you can comment on or make changes to this bug.
Description ajlittoz 2020-12-24 15:13:35 UTC
Created attachment 168467 [details]
Exampled of combined letters

Not sure if this is a bug, clarification needed from developers.

Base glyphs in Unicode may be modified with "combining characters", commonly called diacritics (accents, lines, …).

I expect most of them to be centered over the letter they modify. Some, by property of the language they belong to, may be offset slightly to right or left.

However when using combining lines, the diacritics are "largely" offset to the right, partly covering the next character. Some diacritic lines have property "connects on left and right" or one of the sides only. The offset prevents proper connection with preceding/following mark.

Since these combined characters do not correspond to "language characters" (meaning they are fancy decorated letter, probably not used in any language in the world), I don't know if this "decoration" is an abuse of the feature or a real bug.

The bug may also be located in the font renderer.

Attached file contains examples of combined characters. Those with acute accent or macron behave as expected, but those with combining overline don't.

Thanks to clarify the issue.

Report filed as a follow-on to https://ask.libreoffice.org/en/question/284076/hide-dotted-circle-when-using-a-combining-character/
Comment 1 V Stuart Foote 2021-01-04 17:08:54 UTC
Very font dependent.

Try a font other than the Times New Roman you've set your Default Paragraph Style to.

Libertinus Serif [1] or Noto Serif [2] for example?

=-ref-=

[1] https://github.com/alerque/libertinus
[2] https://fonts.google.com/specimen/Noto+Serif
Comment 2 ajlittoz 2021-01-05 09:53:52 UTC
There may be some misunderstanding on my side.

When Unicode qualifies a combining mark "connects on left and right", does it mean this is a connector between the marked base character and the next one? If this is the case, should the connecting line be drawn centred on the "vertical separation" between the glyphs? This also means that in presence of successive characters with the "left & right connecting" overline a single continuous line is drawn over the sequence.

With this reading, the present rendering is the right one.

This may also imply that a "connects on right"-only mark should leave a visible break (offset at left) when several so marked characters are rendered with the mark over the separation, potentially offering continuous rendering at right if next character is combined with a "connects on right and left" or "connects on left".

The paragraph in chapter 7.9 Combining Marks of the Unicode standard, p. 331, paragraph "Underlining and Overlining" is not of great help. Nor is the annex about character properties.
Comment 3 V Stuart Foote 2021-01-05 15:11:31 UTC
Created attachment 168705 [details]
LO master/7.2 hybrid PDF of Writer doc showing differences between U+0305 and U+035e, "connect left or right" or default

Attaching a sample document (LO hybrid PDF with ODT embeded) showing combining behavior of "connects on left and right" like U+0305, vs default U+035e

Study the subtle differences between the <Alt>+X conversions of the U+0305 "connects on left and right" spans, vs the default of U+035e.  

I did not poke at bi-directional behavior for RTL scripts--but assume they reverse appropriately.

Also note how the Libertinus metrics correctly handle this Unicode function, while Liberation Serif and Noto Sans do not, illustrating the font dependency working with combining diacritics.
Comment 4 V Stuart Foote 2021-01-05 15:19:23 UTC
(In reply to V Stuart Foote from comment #3)
> Created attachment 168705 [details]

Sorry should have noted this is on a Windows build
Version: 7.2.0.0.alpha0+ (x64)
Build ID: 90668f3473f4e52cec823ad39c6fcb44ba7c089b
CPU threads: 4; OS: Windows 10.0 Build 19042; UI render: Skia/Vulkan; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: threaded

There is different os/DE font handling on Linux and macOS installs, so results can differ even though the shaping for all builds is with Harfbuzz.
Comment 5 QA Administrators 2021-07-05 03:48:20 UTC Comment hidden (obsolete)
Comment 6 ajlittoz 2021-07-05 08:21:07 UTC
Sorry for having zapped the latest comment(s).

It effectively appears it is highly dependent on the font.

Although some marks have property "connects on left and right", the vertical position of the mark depends on the glyph to which it is applied (height of ascender/descender). Consequently a "continuous" overline or underline cannot be guaranteed. This adds to the "jitter" on the horizontal position.

Also my computer has been updated (OS, DE, LO) since I filed this report and behaviour is not exactly the same as described above. LO is now 7.1.4.2 (don't know for Harfbuzz) and seems to have a more regular rendering.

As I commented in AskLO question, the initial purpose looks to me as an abuse of Unicode features. We are perhaps in an undefined area of Unicode because this "decoration" attempt does not correspond to any typographical usage in any language, except possibly in maths but there are dedicated formula editors for that.

A remark about the PDF attachment (comparison chart between several fonts): some sequences are wrong in my opinion because they start with a combining mark not preceded by a non-combining character. This is obvious in the Liberation Serif block where a dummy glyph is added (dotted circle) as a place holder for the "normal" character.

Changed status to UNCONFIRMED but I don't object to closing it with adequate status.
Comment 7 ajlittoz 2022-05-01 14:46:39 UTC
@Buovjaga

Could you please give the reasons why you consider this NOTABUG?

 Also you changed status to RESOLVED which, for me, implies some action has been undertaken, while CLOSED would rather qualify for dropping the report without any action.

I am not familiar with LO bug handling, so thanks for clarifying a bit.
Comment 8 V Stuart Foote 2022-05-01 15:15:01 UTC
(In reply to ajlittoz from comment #7)
> @Buovjaga
> 
> Could you please give the reasons why you consider this NOTABUG?
> 
>  Also you changed status to RESOLVED which, for me, implies some action has
> been undertaken, while CLOSED would rather qualify for dropping the report
> without any action.
> 
> I am not familiar with LO bug handling, so thanks for clarifying a bit.

Resolved, because it is not a bug with LibreOffice handling. We can not control the incorrectly state font metrics--and positioning of the combining diacritics is a font metric that is either correct or is incorrect.

Attachment 168705 [details] (a Hybrid PDF) will display that fact when in a PDF viewer. Note you'd need the specifice Libertinus and Noto fonts to open its ODF into Writer.

Again any misbehavior is with the font. So this could be NOT OUR BUG, but equally correct to resolve NOT A BUG--LibreOffice's handing is correct when correct font metrics are available in the font.