Bug 152024 - Diacritics are cut off in top and bottom of paragraph
Summary: Diacritics are cut off in top and bottom of paragraph
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.1.4.2 release
Hardware: All All
: medium normal
Assignee: Jonathan Clark
URL:
Whiteboard: target:24.8.0 inReleaseNotes:24.8
Keywords: bibisected, regression
: 99578 105346 106777 114386 119608 126169 135425 141901 (view as bug list)
Depends on:
Blocks: Font-Rendering Diacritics CTL
  Show dependency treegraph
 
Reported: 2022-11-13 18:33 UTC by Hossein
Modified: 2024-08-12 20:22 UTC (History)
10 users (show)

See Also:
Crash report or crash signature:


Attachments
Diacritics cut off (20.73 KB, application/vnd.oasis.opendocument.text)
2022-11-13 18:33 UTC, Hossein
Details
Diacritics is not cut off (PDF output) (7.49 KB, application/pdf)
2022-11-13 18:34 UTC, Hossein
Details
Diacritics cut off (screenshot from the display) (19.12 KB, image/png)
2022-11-13 18:37 UTC, Hossein
Details
Diacritics are cut off also on the top (18.69 KB, application/vnd.oasis.opendocument.text)
2024-06-08 16:27 UTC, Hossein
Details
Document showing diacritics on the top and bottom (12.08 KB, application/vnd.oasis.opendocument.text)
2024-06-08 16:53 UTC, Hossein
Details
Extra ascent stress test (19.24 KB, application/vnd.oasis.opendocument.text)
2024-06-10 13:06 UTC, Jonathan Clark
Details
Diacritic doc overflow in 24.2.4.2 (31.77 KB, image/png)
2024-06-11 13:31 UTC, Jonathan Clark
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Hossein 2022-11-13 18:33:47 UTC
Created attachment 183563 [details]
Diacritics cut off

Description:
In Arabic script languages like Arabic, Persian, etc., several diacritics are used; some of them are called 'harakat's. These diacritics are placed above or below the characters depending on their type.

In the last line of a paragraph, for the font size bigger than 150 with Tahoma font, rendering of diacritics below the characters is cut off. This is incorrect rendering.

This problem is limited to the display, as the output is correct, without cut off.

Steps to Reproduce:


Actual Results:
In the first page of attachment, there are 3 paragraphs, each one a single line, containing text with font size 150, 151, and 152 respectively. In this case, the diacritics in the 2 last lines are cut off.

In the second page of the attachment, there is only 1 paragraph that is wrapped into 3 lines. The font sizes are equal to the ones in the first page. In this page, only the last line with font size 152 has cut off diacritics.

Expected Results:
Mo cut off in the diacritics

Reproducible: Always


User Profile Reset: No


Additional Info:
Version: 7.5.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 75b569890a6630bb2a5b727c8567f7ea59ccb62e
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

For more information about diacritics, One may refer to:
* Diacritic
  https://en.wikipedia.org/wiki/Diacritic#Arabic
* Arabic Diacritic
  https://en.wikipedia.org/wiki/Arabic_diacritics
Comment 1 Hossein 2022-11-13 18:34:43 UTC
Created attachment 183564 [details]
Diacritics is not cut off (PDF output)

The PDF output is OK.
Comment 2 Hossein 2022-11-13 18:37:16 UTC
Created attachment 183565 [details]
Diacritics cut off (screenshot from the display)

The display is wrong, and the diacritics are cut off
Comment 3 raal 2022-11-13 19:52:01 UTC
confirm with Version: 7.5.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: cfc8a8f5d841b3f84d207196153be67da7f60652
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: cs-CZ (cs_CZ.UTF-8); UI: en-US
Calc: threaded

and Version 4.1.0.0.alpha0+ (Build ID: efca6f15609322f62a35619619a6d5fe5c9bd5a) - in 4.1 it's different.
Comment 4 Hossein 2022-11-24 08:29:56 UTC
Not reproducible with LO 3.5:
LibreOffice 3.5.0rc3 
Build ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735

It is important to add that sometimes (but not always) the cut off goes away when tatweel character (ـ) is removed from the text.
Comment 5 Stéphane Guillou (stragu) 2023-10-13 08:16:24 UTC
For me, checking with the libreoffice-64-releases bibisect repository, the situation really changed at libreoffice-4.1.4.2:

- OOo 3.3 > libreoffice-4.1.4.1: display is sometimes wrong for the first paragraph on fileopen, bug zoom changes refresh and correct the view. However, scroll up and down makes diacritic appear cut off.
- since libreoffice-4.1.4.2 and still current in recent trunk build: all paragraphs displayed wrong from fileopen; a zoom does no refresh the view anymore.

So I see a temporary display issue inherited from OOo, but it's now permanent since 4.1.4.2.
Not sure what is relevant in these changes:
https://wiki.documentfoundation.org/Releases/4.1.4/RC2#List_of_fixed_bugs

(In reply to raal from comment #3)
> and Version 4.1.0.0.alpha0+ (Build ID:
> efca6f15609322f62a35619619a6d5fe5c9bd5a) - in 4.1 it's different.

I assume that's the difference you saw, raal?

Khaled, maybe you have an idea?
Comment 6 ⁨خالد حسني⁩ 2023-10-17 13:43:19 UTC
Sounds like a clipping issue. We have lots of these, and there are many similar bug reports affecting Latin diacritics and other writing systems. It appears that we use the font’s ascender and descender to calculate the clipping box, but they are meant to control line spacing and fonts often has glyphs that go beyond them. I never figured out where is this clipping code, so this is just speculative.
Comment 7 ⁨خالد حسني⁩ 2023-11-26 18:47:48 UTC Comment hidden (obsolete)
Comment 8 Hossein 2024-05-27 14:04:29 UTC
*** Bug 135425 has been marked as a duplicate of this bug. ***
Comment 9 Hossein 2024-06-01 10:55:41 UTC
*** Bug 119608 has been marked as a duplicate of this bug. ***
Comment 10 Hossein 2024-06-01 10:56:10 UTC
*** Bug 99578 has been marked as a duplicate of this bug. ***
Comment 11 Stéphane Guillou (stragu) 2024-06-03 04:10:21 UTC
(In reply to Stéphane Guillou (stragu) from comment #5)
> - since libreoffice-4.1.4.2 and still current in recent trunk build: all
> paragraphs displayed wrong from fileopen; a zoom does no refresh the view
> anymore.
This step, where issue is more widespread and permanently visible, I bibisected with linux-42max to this mostly-skipped range:

cafcf33915fe String to OUString + prefix names of data members
cb34d10ec875 String to OUString + prefix for data member
a21b793383d8 import/export WEBSERVICE and FILTERXML from/to .xlsx
efdefd379406 make this less error prone
318ca03903c9 Help button added to the aboutconfigvalue dialog
5b827d6c75cc String to OUString, data member prefix, some formatting
bb2811e89150 make ESC dtrt for toolboxes inside containers inside dialogs
bfb2266c9241 convert env format page to .ui
084ac2bfcb37 corrected SC_OPCODE_STOP_2_PAR and SC_OPCODE_LAST_OPCODE_ID
6050cf2e100e String to OUString + prefix for data members
64b927ba873d Remove ambiguity.
2ec19ed78289 fix warning: multi-line comment [-Werror=comment]

Or: https://git.libreoffice.org/core/+log/96336caac7880de89c2c46523c2857de2c41f318..cafcf33915fe30af693a94f4224fedd5cb9f9a55

Jonathan, you might be interested.
Comment 12 Jonathan Clark 2024-06-03 14:13:29 UTC
(In reply to Stéphane Guillou (stragu) from comment #11)
> Jonathan, you might be interested.

Thanks for the bibisect. I manually reviewed the listed commits, but none of them look like a plausible source.

The way this bug is triggered is a bit particular:

The root cause is Writer reusing the frame area as the paint area (clipping area) for text frames (see SwFrame::GetPaintArea, SwTextFrame::FormatAdjust). The frame area is based on line height, so anything drawn below the font's descent is excluded. This happens consistently for all paragraphs, but it's hard to notice because clipping is only enabled conditionally.
Comment 13 Hossein 2024-06-08 16:19:22 UTC
@Jonathan:
Thanks for the explanation. In fact, diacritics are cut off at the boundaries of the paragraph, which can be either bottom, or even top of the paragraph. I have adjusted the title to better reflect this. I don't know if this may happen in left/right side of paragraph.
As an example, characters like أ can show the issue. I will attach an example.
Comment 14 Hossein 2024-06-08 16:27:10 UTC
Created attachment 194608 [details]
Diacritics are cut off also on the top

This is essentially the same file, but with an extra character أ, which shows the diacritics are also cut off on the top. On the first line, hamza above alif (أ) is cut off. If you resize the text, or in other lines, you may see it correctly.
There are several other examples of the diacritics that are put above the characters. They can be good to reproduce the problem with the diacritics which are cut off on the top.
Comment 15 Hossein 2024-06-08 16:53:47 UTC
Created attachment 194609 [details]
Document showing diacritics on the top and bottom

This is a simplified example that shows diacritics on the top and bottom. In this example with huge font size, diacritics are not cut off immediately when loading with LibreOffice 24.2, but they will be cut off if you try editing. After a refresh, they seems to be fine again.

Version: 24.2.2.2 (X86_64) / LibreOffice Community
Build ID: d56cc158d8a96260b836f100ef4b4ef25d6f1a01
CPU threads: 12; OS: Linux 6.2; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded
Comment 16 Jonathan Clark 2024-06-10 13:06:15 UTC
Created attachment 194628 [details]
Extra ascent stress test

Extreme case for diacritics above the top line of a paragraph.
Comment 17 Jonathan Clark 2024-06-10 13:16:04 UTC
(In reply to Hossein from comment #13)
> @Jonathan:
> Thanks for the explanation. In fact, diacritics are cut off at the
> boundaries of the paragraph, which can be either bottom, or even top of the
> paragraph. I have adjusted the title to better reflect this.

I wasn't able to reproduce top clipping with your example file, but I was able to force this to happen using Thai diacritics (the "extra ascent stress test" attachment).

The problem can be seen in this file by zooming in and scrolling left to right. The Thai text also does not repaint correctly if you edit the first line of the file.

> I don't know if this may happen in left/right side of paragraph.

Writer uses the width of the document for the paragraph paint area, so this is not a concern.
Comment 18 Commit Notification 2024-06-10 15:12:00 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/976b16b1c6ad6e6eaded7a9fb24388c4512e21e2

tdf#152024 Diacritics cut off at top and bottom of paragraph

It will be available in 24.8.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Jonathan Clark 2024-06-10 15:21:36 UTC
The information has become a bit scattered, so I will collect it here:

Steps to verify:

- Open https://bugs.documentfoundation.org/attachment.cgi?id=183563
- Zoom and scroll through the first page. Without the patch, the diacritics below the characters have their bottom corners cut off. After the patch, the diacritics are rendered correctly.

- Open https://bugs.documentfoundation.org/attachment.cgi?id=194609
- Start at the end of the line, and use backspace to delete all characters. Without the patch, the diacritic lingers for a moment on screen when it is deleted. After the patch, the text is updated correctly during editing.

- Open https://bugs.documentfoundation.org/attachment.cgi?id=194628
- Zoom in and scroll left to right. Without the patch, the diacritics are cut off and disappear once the screen has scrolled to the right. With the patch, the diacritics are rendered correctly.
- Edit the text on the first line. Without the patch, a horizontal line of the Thai text is erased and does not reappear. After the patch, the text is *either* drawn over the Thai diacritics, or the Thai text heals after a moment.
Comment 20 Hossein 2024-06-11 12:58:35 UTC
@Jonathan,
Thanks for fixing the diacritics problem. Now I see the display is similar to what is exported to PDF, which is supposed to be what is printed.

One thing to mention is that now I see some artifacts, including diacritics out of the defined rectangle area for the page, which go further until the horizontal ruler. Also, a horizontal line near the top border of the page rectangle.

In attachment 194628 [details], first issue is visible. I think that is a problem for two reasons:

1) The page rectangle is usually considered the rendering area. In Writer, you can not have text box or shapes outside this area. In Impress/PowerPoint, you can.

2) Things outside this rectangle are not visible in the output. Try exporting to PDF.

What do you think?
Comment 21 Jonathan Clark 2024-06-11 13:31:02 UTC
Created attachment 194658 [details]
Diacritic doc overflow in 24.2.4.2

Screenshot illustrating the document overflow issue existed in previous released versions (24.2.4.2)
Comment 22 Jonathan Clark 2024-06-11 13:47:03 UTC
(In reply to Hossein from comment #20)
> One thing to mention is that now I see some artifacts, including diacritics
> out of the defined rectangle area for the page, which go further until the
> horizontal ruler. Also, a horizontal line near the top border of the page
> rectangle.
> 
> What do you think?

I attached a screenshot from 24.2.4.2 showing that this overflow already existed in Writer prior this change (i.e. it's a separate issue).

For that separate issue:

In principle, overflowing the top of the document looks bad, and we should try to maintain the same appearance between authoring and printing where possible.

In practice, I think this will only happen if the font is buggy or if the user gives input that is specifically crafted to cause this (such as https://bugs.documentfoundation.org/attachment.cgi?id=194628, or "zalgo text"). In either case, hiding the overflowing glyphs from the user might do more harm than good. It won't be as clear to a font designer that they made a mistake, or to a user that they're deleting diacritics, for example.

I can't come up with a compelling argument either way. However, if other people think paragraphs should be clipped to the page area, that bug probably shouldn't be treated as a high priority.
Comment 23 Jonathan Clark 2024-06-14 02:03:53 UTC
*** Bug 105346 has been marked as a duplicate of this bug. ***
Comment 24 Jonathan Clark 2024-06-14 02:04:43 UTC
*** Bug 106777 has been marked as a duplicate of this bug. ***
Comment 25 Jonathan Clark 2024-06-14 02:16:02 UTC
*** Bug 114386 has been marked as a duplicate of this bug. ***
Comment 26 Jonathan Clark 2024-07-23 11:04:06 UTC
*** Bug 126169 has been marked as a duplicate of this bug. ***
Comment 27 Volga 2024-07-24 16:05:22 UTC Comment hidden (obsolete)
Comment 28 Jonathan Clark 2024-08-12 20:22:28 UTC
*** Bug 141901 has been marked as a duplicate of this bug. ***