Bug 130314 - Writer busy-locks in layouting CJK fonts (was: hangs while converting to pdf) - see comment #9 / #8
Summary: Writer busy-locks in layouting CJK fonts (was: hangs while converting to pdf)...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
6.2.0.3 release
Hardware: All All
: high critical
Assignee: Mark Hung
URL:
Whiteboard: target:7.1.0 target:7.0.4
Keywords: bibisected, bisected, regression
Depends on:
Blocks:
 
Reported: 2020-01-31 10:49 UTC by jinhongxin
Modified: 2020-10-26 18:30 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
perf flamegraph (10.50 KB, application/x-bzip)
2020-01-31 19:32 UTC, Julien Nabet
Details
Example file (1.42 MB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2020-06-07 20:05 UTC, Telesto
Details

Note You need to log in before you can comment on or make changes to this bug.
Description jinhongxin 2020-01-31 10:49:56 UTC
Description:
run 6.4.0.3 the shell in cenos 7.2 convert docx to pdf 
and the shell hang ,need help ,thx

version 5.3.6.1 is ok 

Steps to Reproduce:
1.wget https://3tdoc.oss-cn-beijing.aliyuncs.com/3tclass/0/0002_7908.docx
2./opt/libreoffice6.4/program/soffice --headless --invisible --convert-to pdf /tmp/0002_7908.docx --outdir /tmp
3.

Actual Results:
convert /tmp/0002_7908.docx -> /tmp/0002_7908.pdf using filter : writer_pdf_Export

shell hang  

Expected Results:
can convert pdf 


Reproducible: Always


User Profile Reset: Yes


OpenGL enabled: Yes

Additional Info:
Comment 1 Julien Nabet 2020-01-31 19:32:24 UTC
Created attachment 157569 [details]
perf flamegraph

Here's a Flamegraph retrieved on pc Debian x86-64 with master sources updated today.
Comment 2 Julien Nabet 2020-01-31 20:47:59 UTC
Eike: it seems there's some loop there:
#0  0x00007fffe1239684 in i18npool::BreakIteratorImpl::getLocaleSpecificBreakIterator(com::sun::star::lang::Locale const&) (this=0x55555837f0f0, rLocale=...)
    at /home/julien/lo/libreoffice/i18npool/source/breakiterator/breakiteratorImpl.cxx:575
#1  0x00007fffe123761d in i18npool::BreakIteratorImpl::nextCharacters(rtl::OUString const&, int, com::sun::star::lang::Locale const&, short, int, int&)
    (this=0x55555837f0f0, Text="", nStartPos=4, rLocale=..., nCharacterIteratorMode=1, nCount=1, nDone=@0x7ffffffecdd8: 0)
    at /home/julien/lo/libreoffice/i18npool/source/breakiterator/breakiteratorImpl.cxx:58
#2  0x00007fffd9eacd30 in SwScriptInfo::CountCJKCharacters(rtl::OUString const&, o3tl::strong_int<int, Tag_TextFrameIndex>, o3tl::strong_int<int, Tag_TextFrameIndex>, o3tl::strong_int<unsigned short, LanguageTypeTag>) (rText="", nPos=..., nEnd=..., aLang=...) at /home/julien/lo/libreoffice/sw/source/core/text/porlay.cxx:2604
#3  0x00007fffd9edd1d5 in lcl_AddSpace(SwTextSizeInfo const&, rtl::OUString const*, SwLinePortion const&) (rInf=..., pStr=0x7ffffffed390, rPor=...)
    at /home/julien/lo/libreoffice/sw/source/core/text/portxt.cxx:100
#4  0x00007fffd9edf9f9 in SwTextPortion::CalcSpacing(long, SwTextSizeInfo const&) const (this=0x55555f28f5a0, nSpaceAdd=184, rInf=...) at /home/julien/lo/libreoffice/sw/source/core/text/portxt.cxx:646
#5  0x00007fffd9e62602 in SwTextPaintInfo::CalcRect(SwLinePortion const&, SwRect*, SwRect*, bool) const (this=0x7ffffffed7d0, rPor=..., pRect=0x7ffffffed0b0, pIntersect=0x0, bInsideBox=true)
    at /home/julien/lo/libreoffice/sw/source/core/text/inftxt.cxx:796

More precisely:
Thread 1 "soffice.bin" hit Breakpoint 3, SwScriptInfo::CountCJKCharacters (rText="", nPos=..., nEnd=..., aLang=...) at /home/julien/lo/libreoffice/sw/source/core/text/porlay.cxx:2604
2604	            nPos = TextFrameIndex(g_pBreakIt->GetBreakIter()->nextCharacters(

(gdb) p nCount
$5 = {m_value = -1205828024}
Comment 3 jinhongxin 2020-02-01 12:31:54 UTC
is the root cause from font?
Comment 4 Julien Nabet 2020-02-01 13:24:33 UTC
(In reply to jinhongxin from comment #3)
> is the root cause from font?

If it worked before, I'd rather say ICU component more than the font but I'm not an expert.
Comment 5 Telesto 2020-06-07 20:05:34 UTC
Created attachment 161746 [details]
Example file
Comment 6 Telesto 2020-06-08 19:46:08 UTC
@Buovjaga
Maybe ICU too? Same as bug 126344 (speculation)
Comment 7 Buovjaga 2020-06-09 17:33:11 UTC
(In reply to Telesto from comment #6)
> @Buovjaga
> Maybe ICU too? Same as bug 126344 (speculation)

It's not the ICU update. Bibisected with win32-6.2 to https://git.libreoffice.org/core/+/fad862e290d727fc9fefe206f6e4b807482c4175%5E!/
tdf#118555 fix HFONT fallback handing / lifecycle

Adding Cc: to Jan-Marek Glogowski

I tested using GUI, not command line.

Previous commit takes 12 seconds.
Blamed commit takes 55 seconds.
Comment 8 Jan-Marek Glogowski 2020-09-14 04:53:32 UTC
(In reply to Buovjaga from comment #7)
> (In reply to Telesto from comment #6)
> > @Buovjaga
> > Maybe ICU too? Same as bug 126344 (speculation)
> 
> It's not the ICU update. Bibisected with win32-6.2 to
> https://git.libreoffice.org/core/+/
> fad862e290d727fc9fefe206f6e4b807482c4175%5E!/
> tdf#118555 fix HFONT fallback handing / lifecycle
> 
> Adding Cc: to Jan-Marek Glogowski
> 
> I tested using GUI, not command line.
> 
> Previous commit takes 12 seconds.
> Blamed commit takes 55 seconds.

While I know this commit added quite some overhead for Windows initially, it can't be the problematic commit, because that commit is Windows only. FWIW I don't know about the current state.

So I bibisected myself. The good time here is ~10s. And the bad time is oo (AKA unlimited) - the document never finishes, as the reporter claimed, which is true.

This started with:

commit 9fc9510ae3f46e5c1fd65303bac9f01ddc79cb5c
    tdf#106174 writerfilter: bidi - prev adjust? prev bidi?

This patch introduces / uncovers a bug / state, which SwScriptInfo::CountCJKCharacters can't handle.

That code was added in

commit dcef76b34aa1dca8389b3c068dc3d82a11d2c382
    tdf#43740 Count CJK characters to distribute spaces.

The problem is, that for the bugdoc rText.getLength() < nEnd, so nPos < nEnd is will always true. Maybe the loop should just end, if nDone == 0? No idea, if even the input can be considered correct.

I simply couldn't uncover the constraints of rText, nPos, nEnd, SwTextSizeInfo and SwLinePortion and their respective Len, Idx and Text values in more then a few hours (incl. debug, callgrind). Every time I thought I had grooked it, I found something new, like it's valid that Idx + Len > len(Text)...
Comment 9 Jan-Marek Glogowski 2020-09-14 05:07:03 UTC
FWIW: the same bug happens, if you scroll the document to page 28. And it's Writer specific, so moving the bug there.
Comment 10 Justin L 2020-09-14 10:03:21 UTC
(In reply to Jan-Marek Glogowski from comment #8)
> commit 9fc9510ae3f46e5c1fd65303bac9f01ddc79cb5c
>     tdf#106174 writerfilter: bidi - prev adjust? prev bidi?
> 
> This patch introduces / uncovers a bug / state, which
> SwScriptInfo::CountCJKCharacters can't handle.

My involvement here results in the paragraph style "Plain Text"'s "full justify" positioning taking effect, instead of having a "left justify" direct formatting plastered on each paragraph. (And that is correct BTW - Word 2016 shows most paragraphs with full justify.)

And that makes sense, since the space calculation would come into effect for justifying each line. So my commit has just exposed an existing bug.
Comment 11 Buovjaga 2020-09-15 15:09:12 UTC
Tweaking priority and severity per discussion with Jan-Marek
Comment 12 Mark Hung 2020-09-17 11:37:19 UTC
Problematic contents begin from page 16. Exporting p1-p15 is fine. LibreOffice hangs even if you scroll over p16. It's a simplified Chinese text - that I don't think there is really any RTL content inside it. I'll try to check what's wrong in there.
Comment 13 Ming Hua 2020-09-17 14:56:12 UTC
(In reply to Mark Hung from comment #12)
> Problematic contents begin from page 16. Exporting p1-p15 is fine.
> LibreOffice hangs even if you scroll over p16. It's a simplified Chinese
> text - that I don't think there is really any RTL content inside it. I'll
> try to check what's wrong in there.
After a quick glance (I'm not a developer, just a simplified Chinese user), the only thing stands out is the symbols used in chemical reaction formulas in question 14 and 15.  They use a different font, and are rendered very differently from the rest of the text in my LO 5.2.7: with a gray background, and the hovering over it changes the mouse cursor as if it's a clickable link.

I also confirm that there is apparently no RTL content, at least not on page 16.
Comment 14 Mark Hung 2020-09-26 05:32:43 UTC
The problematic content is inside a SwTextInputFieldPortion. By ignoring this kind of portion the conversion can work. It is not in a text field group. It doesn't have a valid GetExpText() return either.
Comment 15 Commit Notification 2020-10-08 12:21:31 UTC
Mark Hung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/ac76f9e8ad8b077623725d0f6dceb13adb37e43a

tdf#130314 space counting for input field.

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 16 Buovjaga 2020-10-08 14:02:49 UTC
Still taking a huge amount of time after the commit, I killed it after over 2 minutes (used export from UI).

Arch Linux 64-bit
Version: 7.1.0.0.alpha0+
Build ID: e67a755e6d4e3241360c75c3362f90a3af5999ac
CPU threads: 8; OS: Linux 5.8; UI render: default; VCL: kf5
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 8 October 2020
Comment 17 Commit Notification 2020-10-09 12:43:48 UTC
Mark Hung committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/54e1e5f597705a1244701c75233a2c3a68a7d844

tdf#130314 fix incorrect logic in last commit

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Buovjaga 2020-10-09 14:15:54 UTC
With the latest commit it only takes 6 seconds!

Arch Linux 64-bit
Version: 7.1.0.0.alpha0+
Build ID: 54e1e5f597705a1244701c75233a2c3a68a7d844
CPU threads: 8; OS: Linux 5.8; UI render: default; VCL: kf5
Locale: fi-FI (fi_FI.UTF-8); UI: en-US
Calc: threaded
Built on 9 October 2020
Comment 19 Xisco Faulí 2020-10-26 11:17:46 UTC
Verified in

Version: 7.1.0.0.alpha1+
Build ID: 0f0a5b63b19196f9463149a0a1991704c940efe2
CPU threads: 4; OS: Linux 5.7; UI render: default; VCL: gtk3
Locale: en-US (en_US.UTF-8); UI: en-US
Calc: threaded

@Mark Hung, thanks for fixing this issue!!
Comment 20 Commit Notification 2020-10-26 18:29:04 UTC
Xisco Fauli committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/8f362f1bc5ceca9bde282b5db98282b1ab132309

tdf#130314: sw_odfexport: Add unittest

It will be available in 7.1.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2020-10-26 18:30:20 UTC
Mark Hung committed a patch related to this issue.
It has been pushed to "libreoffice-7-0":

https://git.libreoffice.org/core/commit/b8dd7d02a072c80961838b00c9e8cbdbddc9ce08

tdf#130314 space counting for input field.

It will be available in 7.0.4.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.