Bug 155705 - A11Y crash fetching attribute run in second half of a paragraph split over two pages
Summary: A11Y crash fetching attribute run in second half of a paragraph split over tw...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
7.6.0.0 alpha1+
Hardware: All Linux (All)
: medium normal
Assignee: Michael Weghorn
URL:
Whiteboard: target:24.2.0 target:7.6.0.2
Keywords: accessibility
Depends on:
Blocks: a11y, Accessibility
  Show dependency treegraph
 
Reported: 2023-06-06 10:32 UTC by cwendling
Modified: 2023-07-14 05:39 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments
Backtrace (from a cppunit run, but that's the same) (20.73 KB, text/plain)
2023-06-06 10:33 UTC, cwendling
Details
Python script to trigger the crash (1.77 KB, text/x-python)
2023-06-06 10:34 UTC, cwendling
Details
Sample document to reproduce the issue (13.51 KB, application/vnd.oasis.opendocument.text-flat-xml)
2023-06-06 10:36 UTC, cwendling
Details
Sample document to reproduce the issue (13.25 KB, application/vnd.oasis.opendocument.text-flat-xml)
2023-06-06 13:32 UTC, cwendling
Details
Patch adding a test case for this issue (16.87 KB, patch)
2023-06-06 13:33 UTC, cwendling
Details

Note You need to log in before you can comment on or make changes to this bug.
Description cwendling 2023-06-06 10:32:24 UTC
Description:
Querying the attribute run for the second half of a paragraph that has been split in two because it didn't fit on one page crashes in SwTextMarkupHelper::getTextMarkup()

Steps to Reproduce:
1. Open a document with a paragraph that overflows past the end of a page onto the next one.
2. Query the attribute run on the a11y node representing the second half of the paragraph, e.g. on page 2

Actual Results:
Crash (see backtrace)

Expected Results:
No crash :)


Reproducible: Always


User Profile Reset: Yes

Additional Info:
You can use the attached document and script to reproduce the crash.

This happens using GTK3, but doesn't seem directly tied to this platform but to anything querying the markup of such nodes.
Comment 1 cwendling 2023-06-06 10:33:47 UTC
Created attachment 187745 [details]
Backtrace (from a cppunit run, but that's the same)
Comment 2 cwendling 2023-06-06 10:34:44 UTC
Created attachment 187746 [details]
Python script to trigger the crash
Comment 3 cwendling 2023-06-06 10:36:32 UTC
Created attachment 187747 [details]
Sample document to reproduce the issue
Comment 4 cwendling 2023-06-06 13:32:19 UTC
Created attachment 187751 [details]
Sample document to reproduce the issue

Actually as there need to be attributes, this requires not to pass spell checking.  So update the test document to use fake Latin from Lorem Ipsum never to actually pass the verification.
Comment 5 cwendling 2023-06-06 13:33:35 UTC
Created attachment 187752 [details]
Patch adding a test case for this issue

And here is a proper CppUnit test for actually reproducing this crash.
Comment 6 Stéphane Guillou (stragu) 2023-06-07 12:38:32 UTC
The script can't register soffice for me for some reason (assert app fails).
Have you found a way to crash it just using the GUI?
Comment 7 cwendling 2023-06-07 13:11:48 UTC
@stragu Unfortunately not, yet that might be possible somehow, I don't know... you could use the cppunit patch that and then `make CppunitTest_sw_a11y CPPUNIT_TEST_NAME=tdf155705::TestBody` which doesn't rely on AT-SPI seeing LO.

The other solutions I know depend on LO being visible to AT-SPI, so might not work in your case (see below).  But here are a couple:

# Get Orca to read the text attributes.

1. Launch Orca
2. Open the attached sample in LibreOffice
3. Navigate to the second part of the paragraph on page 2 (i.e. put the caret there)
4. Ask Orca to list attributes (that should be Orca+f, with Orca being either CapsLock or Insert)

# With Accerciser

Alternatively (but I doubt that would work if the script doesn't find soffice), you could try manually using Accerciser:

1. Open the attached sample in LibreOffice, making sure the part of the paragraph on page 2 is at least partly visible
2. Open Accerciser
3. Find soffice in the app tree (which might not work then...)
4. Navigate the object tree to find the paragraph on page 2.  That's gonna be at "path" 0 0 0 0 1 0 0 1, which means under soffice child 0, then child 0 of that one, etc.  If there's only one paragraph in the "document text" node and you see the right one on screen, select that one.

That should be enough to crash (Accerciser will query the attributes automatically to display them, and that'll crash).  If that wasn't for some reason, on Accerciser's IPython Console, try `acc.queryText().getAttributes(0)`.

# Debug why soffice isn't found by the escript

If soffice doesn't appear to AT-SPI, I can see 3 reasons:
1. Somehow you don't have the components on your system (odd, but possible I guess?)
2. Somehow LO doesn't expose its objects (wrong VCL (gen won't work)?  Somehow it disables itself in some situations I am not aware of?  maybe ask Michael Weghorn)
3. You have another app running that stalls the requests on the AT-SPI bus.  Unfortunately, if one app doesn't answer requests, it might just time out the whole app listing request.  You can debug that with `orca -l` or in Accerciser (seeing which app is missing from the list, or again in the IPython console something like `desktop = pyatspi.Registry.getDesktop(0); [desktop.getChildAtIndex(i).name for i in range(desktop.getChildCount())]`)

For the moment that's the only things I can think of.
Comment 8 Stéphane Guillou (stragu) 2023-06-07 15:48:39 UTC
Thanks for the detailed instructions.
I didn't manage to get the script to register soffice or a recent master build's executable.
In any case, I wasn't able to reproduce the crash by hand with Ubuntu 20.04, Orca 3.36.2 and:

Version: 7.6.0.0.alpha1+ (X86_64) / LibreOffice Community
Build ID: 52f70f04bdc586a072141e069d451a979c5f4cb7
CPU threads: 8; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded

All paragraph attributes are read out without issue.
When running from the command line, I see many times repeated:

warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:375: need min value
warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:390: minvalue not minimal
warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:390: minvalue not minimal
warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:416: not smaller or equal
warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:527: too long!

I also get the following whenever I change the caret position:

(soffice:19376): WARNING **: 17:44:40.372: Exception in get_text_at_offset()

(which doesn't happen if the paragraph doesn't span multiple pages)
Comment 9 cwendling 2023-06-07 16:02:50 UTC
(In reply to Stéphane Guillou (stragu) from comment #8)
> In any case, I wasn't able to reproduce the crash by hand with Ubuntu 20.04,
> Orca 3.36.2 and:

That's odd, I have a similar Orca, and my LO build is about a week old.  I'm on a Debian Buster-based system though, if that could make any difference…

Would it be possible that I'd have a debug build and you wouldn't?  The issue here is in strtmpl.hxx's newFromSubString(), at the `assert(false)` which is possibly eluded in production builds, so you'd only get a buggy substring (the next line makes a "!!br0ken!!" string) that might not actually be used in the end.

> When running from the command line, I see many times repeated:
> 
> warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:375: need
> min value
> warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:390:
> minvalue not minimal
> warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:390:
> minvalue not minimal
> warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:416: not
> smaller or equal
> warn:legacy.osl:19376:19376:sw/source/core/access/accportions.cxx:527: too
> long!

I see this as well, and I believe it's part of the problem :)

I also just noticed that I also get a crash whenever I use left/right arrows on that part of the text but NOT up/down.
Comment 10 Michael Weghorn 2023-07-04 13:07:08 UTC
I can reproduce the crash, both with the script and with the unit test. (Thanks for providing that, that's very helpful!)

Version: 24.2.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: 4e3c8933bfe274874cbec203cd9345559822df9d
CPU threads: 12; OS: Linux 6.1; UI render: default; VCL: gtk3
Locale: en-GB (en_GB.UTF-8); UI: en-US
Calc: threaded
Comment 11 Michael Weghorn 2023-07-04 13:09:09 UTC
(In reply to cwendling from comment #7)
> # Debug why soffice isn't found by the escript
> 
> [...]
> 2. Somehow LO doesn't expose its objects (wrong VCL (gen won't work)? 
> Somehow it disables itself in some situations I am not aware of?  maybe ask
> Michael Weghorn)

I'm not aware of anything specific. I remember sometimes not seeing soffice in accerciser, but maybe that was more for the 3rd reason you mention (another app stalling requests on the AT-SPI bus). I didn't really have a systematic approach to address this, but IIRC, it did work when closing apps and trying again (I usually started accerciser first), and at least after "View" -> "Refresh Registry", the soffice process would usually show up.
As you mention, e.g. the gen VCL plugin wouldn't work; I'd suggest using gtk3 for a11y purposes at this point.
Comment 12 Michael Weghorn 2023-07-06 11:56:15 UTC
The problem is related to the spell check, doesn't crash when disabling that by setting language for the whole document to "None".

Pending Gerrit changes (including Colomban's unit test):

https://gerrit.libreoffice.org/c/core/+/154108
https://gerrit.libreoffice.org/c/core/+/154109
https://gerrit.libreoffice.org/c/core/+/154130
Comment 13 Commit Notification 2023-07-07 22:12:41 UTC
Michael Weghorn committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/472950414a0fb07d5260b7b6b2d3a5d2dc18a68d

tdf#155705 sw a11y: Unify iterator use in SwTextMarkupHelper

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 14 Commit Notification 2023-07-07 22:12:43 UTC
Michael Weghorn committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/7f7ccf955fa1138b712233628de4a73b3f845c7e

tdf#155705 sw a11y: Only handle paragraph's own spell check data

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 15 Michael Weghorn 2023-07-07 22:14:29 UTC
(In reply to Michael Weghorn from comment #12)
> Pending Gerrit changes (including Colomban's unit test):
> 
> https://gerrit.libreoffice.org/c/core/+/154108
> https://gerrit.libreoffice.org/c/core/+/154109
> https://gerrit.libreoffice.org/c/core/+/154130

The first 2 (the actual fix) are merged, the unit test (third change) turns out to be more tricky, since (as far as I understand) no spellcheck takes place during the CI run due to dictionaries not being present, s. comment in the Gerrit change.
Comment 16 Commit Notification 2023-07-10 09:47:37 UTC
Michael Weghorn committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/dccd32607d2d10a24e0214c01c74ed6dcf8746fb

tdf#155705 sw a11y: Unify iterator use in SwTextMarkupHelper

It will be available in 7.6.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 17 Commit Notification 2023-07-11 05:02:32 UTC
Michael Weghorn committed a patch related to this issue.
It has been pushed to "libreoffice-7-6":

https://git.libreoffice.org/core/commit/7cffdc43f100fa7fad2797c13e10d7a23fdad53b

tdf#155705 sw a11y: Only handle paragraph's own spell check data

It will be available in 7.6.0.2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Commit Notification 2023-07-14 05:39:58 UTC
Colomban Wendling committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/d54a48c008dc9c9cf206d7c7dd37b8d2fdd4e86d

Add test case for tdf#155705

It will be available in 24.2.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.