Bug 92064 - LO unusable with Tibetan super long paragraphs
Summary: LO unusable with Tibetan super long paragraphs
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.4.4.1 rc
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Jonathan Clark
URL:
Whiteboard: target:25.2.0 target:24.8.2
Keywords:
Depends on:
Blocks: China-Minority-Scripts
  Show dependency treegraph
 
Reported: 2015-06-14 11:26 UTC by Elie Roux
Modified: 2024-09-04 16:17 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
Tibetan text (with long paragraph) (187.64 KB, application/x-xz)
2015-06-14 11:26 UTC, Elie Roux
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Elie Roux 2015-06-14 11:26:30 UTC
Created attachment 116527 [details]
Tibetan text (with long paragraph)

Dear All,

In Tibetan the notion of paragraph doesn't exist, and thus texts (even hundreds of pages) are usually in only one paragraph (no line break). MS Word apparently handles that without performance issue, but LibreOffice has huge performance issues when opening or editing this kind of file. See for instance attached (xzipped) file, I didn't really manage to open it, the CPU starts to heat a lot, LO doesn't answer anymore and I have to kill it by hand. This file comes from http://www.dharmadownload.net (the second text from http://www.dharmadownload.net/pages/english/Sungbum/006_mdzod%20bdun/pages/01_mdzod%20bdun%20-%20yid%20zhin%20mdzod.html), and is a normal Tibetan text, not a long paragraph torture test made to test performance...

This bug makes LO unusable in production for Tibetan, which is a pity as many tools are in gestation for spell checking and grammar checking for Tibetan in LO, that would make it much better than Word.

This might be related to https://bugs.documentfoundation.org/show_bug.cgi?id=89666 or https://bugs.documentfoundation.org/show_bug.cgi?id=39372 but it still happens on 4.4.4~rc1 (Debian/sid), so the patches don't seem to be enough.

Thank you very much!
Comment 1 Julien Nabet 2015-06-14 12:40:44 UTC
Elie: I updated tdf#89666 because 4.5.0 won't exist. There have been some patches on 5.0 branch.
Since Michael didn't put tdf#89666 as FIXED, I suppose he thinks there's still work to do.

Anyway, I could reproduce this on pc Debian x86-64 with master sources updated today.
I noticed this on console:
warn:legacy.osl:9871:1:oox/source/helper/graphichelper.cxx:117: GraphicHelper::GraphicHelper - cannot get target frame
Comment 2 Elie Roux 2015-06-14 12:43:37 UTC
Thanks for your consideration! It would be a huge help to the Tibetan community if this could work!
Comment 3 Julien Nabet 2015-09-02 12:35:21 UTC
Version corresponds to "earliest affected" as indicated.

I'll give a try with master sources (future 5.1.0).
Comment 4 Julien Nabet 2015-09-02 18:13:22 UTC
With master sources updated today, I still get the hang when opening.
After 2 minutes, I've got this:
 warn:legacy.osl:11092:1:oox/source/helper/graphichelper.cxx:117: GraphicHelper::GraphicHelper - cannot get target frame
W: Unknown node under /registry/extlang: deprecated
W: Unknown node under /registry/grandfathered: comments
W: Unknown node under /registry/grandfathered: comments

(the 3 last lines aren't specific to this bugtracker)

Miklos: since the xz file contains an rtf, thought you might be interested in this one.
Comment 5 Elie Roux 2015-09-02 18:40:50 UTC
I have the same problem with a docx file... I can convert it into .odt and put it here if you want.
Comment 6 Julien Nabet 2015-09-02 18:57:51 UTC
(In reply to Elie Roux from comment #5)
> I have the same problem with a docx file... I can convert it into .odt and
> put it here if you want.
I suppose it may help to have different formats, so go ahead! :-)
Comment 7 QA Administrators 2016-09-20 10:28:43 UTC Comment hidden (obsolete)
Comment 8 Elie Roux 2016-09-20 16:05:03 UTC
Although there's definitely an improvement compared to 5.0 version, my LO 5.2.0.4 under Debian/Sid is still very slow at opening the file indicated in the initial report, and adding or removing a character takes at least 10s, so I think it's safe to say that LO is still unusable for long Tibetan texts...
Comment 9 Xisco Faulí 2017-09-29 08:51:28 UTC Comment hidden (obsolete)
Comment 10 Elie Roux 2017-09-29 09:22:35 UTC
bug still present, Debian 9, LO 5.4.1.2
Comment 11 Buovjaga 2019-04-24 19:08:00 UTC
Takes 3 minutes to open, unusable perf after that.

Arch Linux 64-bit
Version: 6.3.0.0.alpha0+
Build ID: cfbb223d5666cb803539ac98918ff39b27efc6e7
CPU threads: 8; OS: Linux 5.0; UI render: default; VCL: gtk3; 
Locale: fi-FI (fi_FI.UTF-8); UI-Language: en-US
Calc: threaded
Built on 24 April 2019
Comment 12 QA Administrators 2021-06-17 03:50:19 UTC Comment hidden (obsolete)
Comment 13 Elie Roux 2021-06-17 12:06:41 UTC
still present with LO 7.1.4
Comment 14 QA Administrators 2023-06-18 03:14:38 UTC Comment hidden (obsolete)
Comment 15 Commit Notification 2024-07-13 02:22:40 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/4c8f88bef948b18f3d810c29a7f83496367758a9

tdf#92064 sw: Improve Tibetan layout performance

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Jonathan Clark 2024-07-13 02:39:00 UTC
The above change significantly improves the situation, but further work is needed.

To evaluate the performance impact of this change, I used headless mode to convert https://bugs.documentfoundation.org/attachment.cgi?id=116527 to a PDF. With the change, conversion completed on my machine in 1m49.5s. Without the change, I terminated the attempt without completion after 45 minutes. These results suggest the speedup from this fix is greater than 96%.

Despite this fix, performance in the GUI is poor. There are still long pauses for layout after opening the attachment. Once fully loaded, scrolling through the document is choppy, with excessive time spent shaping and rendering text.

Since this bug requires additional work, I am tentatively resetting its status to new.
Comment 17 Elie Roux 2024-07-13 04:22:56 UTC
Thanks a lot Jonathan for your initial work on that, I really appreciate!
Comment 18 Commit Notification 2024-09-03 19:32:32 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/6594b279a926e497261a4e802a5e74d2f3b97369

tdf#92064 sw: Improve large paragraph layout performance

It will be available in 25.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 19 Jonathan Clark 2024-09-03 19:36:57 UTC
Repeating the experiment mentioned in comment 16, this latest patch reduces conversion time from 1m43.619s to 13.046s (a further 87% reduction).

In my subjective opinion, LO is no longer unusable with Tibetan. Runtime performance is still imperfect, but it feels like it is within the ballpark of other CTL languages. Based on this, I am marking this bug fixed.
Comment 20 Elie Roux 2024-09-03 22:09:51 UTC
This sounds amazing, thanks a lot for your work on this, this has the real potential to change the future of open source adoption for low resource languages!
Comment 21 Xisco Faulí 2024-09-04 09:32:20 UTC
Using time ./instdir/program/soffice --headless --convert-to "pdf" /home/xisco/Descargas/01_2_V1_yid\ bzhin\ mdzod_drelpa.rtf --outdir /home/xisco/Descargas/

it takes

real	0m13,733s
user	0m11,913s
sys	0m1,715s

with

Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 6594b279a926e497261a4e802a5e74d2f3b97369
CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: es-ES (es_ES.UTF-8); UI: en-US
Calc: threaded

while it takes

real	2m58,510s
user	2m56,773s
sys	0m1,708s


with

Version: 25.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 7d3251adf2e95768c9169b92c8b3366c95f71bfa
CPU threads: 8; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: es-ES (es_ES.UTF-8); UI: en-US
Calc: threaded
Comment 22 Commit Notification 2024-09-04 16:17:33 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "libreoffice-24-8":

https://git.libreoffice.org/core/commit/ef40759390de4eba93d0a1e9369fc8ba5c1ea534

tdf#92064 sw: Improve Tibetan layout performance

It will be available in 24.8.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 23 Commit Notification 2024-09-04 16:17:36 UTC
Jonathan Clark committed a patch related to this issue.
It has been pushed to "libreoffice-24-8":

https://git.libreoffice.org/core/commit/b0908a76d02e7babf23c4287f57f3d6e368e26e8

tdf#92064 sw: Improve large paragraph layout performance

It will be available in 24.8.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.