Bug 156657 - Exposing all Calc cells in accessibility tree is incompatible with AtspiCollection
Summary: Exposing all Calc cells in accessibility tree is incompatible with AtspiColle...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
7.5.5.2 release
Hardware: All Linux (All)
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords: accessibility
Depends on:
Blocks: a11y-Linux
  Show dependency treegraph
 
Reported: 2023-08-07 17:10 UTC by Joanmarie Diggs
Modified: 2024-04-05 07:59 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
tool to make calc unresponsive (1.60 KB, text/x-python)
2023-08-07 17:11 UTC, Joanmarie Diggs
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Joanmarie Diggs 2023-08-07 17:10:58 UTC
Description:
The AtspiCollection interface, while originally created for web content, works on all ATK implementations and is a much more performant (sometimes 10x faster) way for Orca to locate objects of interest compared to iterating recursively through the accessibility tree. As a result, I (Orca maintainer) am slowly but surely migrating over to preferring AtspiCollection.

Unfortunately, the fact that Calc exposes all cells in the accessibility tree means Orca can not apply this to at least Calc and possible not to LibreOffice at all (just to be safe). See steps and attached python tool which makes Calc completely unresponsive.

Steps to Reproduce:
1. Launch Writer
2. Launch Calc
3. Launch the to-be-attached python tool in a terminal
4. Click on the Writer window
5. Click on the Calc window

Actual Results:
Each time a window is activated, the script goes looking for a single status bar using AtspiCollection. It succeeds with Writer and errors out with Calc and exits. But even though the tool has exited, Calc is non-responsive.

Here's the output from the tool:

======================================================
Searching in Untitled 1 - LibreOffice Writer frame
Found  status bar


Searching in Untitled 2 - LibreOffice Calc frame

EXCEPTION: atspi_error: timeout from dbind (1)
Traceback (most recent call last):
  File "/home/jd/Desktop/./listener.py", line 35, in on_event
    if not objs:
           ^^^^
UnboundLocalError: cannot access local variable 'objs' where it is not associated with a value
======================================================

Expected Results:
Calc wouldn't become completely unresponsive.


Reproducible: Always


User Profile Reset: No

Additional Info:
I filed https://gitlab.gnome.org/GNOME/at-spi2-core/-/issues/138 to ask Atspi to not descend ginormous tables.

As you will see in a later comment, I also suggested that Calc should consider doing what web app authors with giant grids do, namely only include a subset of the conceptual table in the DOM and use ARIA properties to expose stuff ATs should report to end users.
Comment 1 Joanmarie Diggs 2023-08-07 17:11:40 UTC
Created attachment 188833 [details]
tool to make calc unresponsive
Comment 2 Michael Weghorn 2023-08-08 05:49:30 UTC
(In reply to Joanmarie Diggs from comment #0)
> Additional Info:
> I filed https://gitlab.gnome.org/GNOME/at-spi2-core/-/issues/138 to ask
> Atspi to not descend ginormous tables.
> 
> As you will see in a later comment, I also suggested that Calc should
> consider doing what web app authors with giant grids do, namely only include
> a subset of the conceptual table in the DOM and use ARIA properties to
> expose stuff ATs should report to end users.
I'm not experienced with what web authors do, so have a few questions:

Are there any further information/best-practices on how that should work, i.e. what you'd expect Calc to report, etc.?

I've found https://developer.mozilla.org/en-US/docs/Web/Accessibility/ARIA/Roles/table_role but that doesn't really give too many details yet.

It has this example:

> The above is part of a table. While the full table has 81 entries, as indicated
> by the aria-rowcount property, only four are currently visible. The columns are
> sortable, but not currently sorted, as indicated by the aria-sort property on
> the column headers.
Does that mean that only what's visible on screen should be exposed?
IIRC, Colomban (CC'ed) mentioned that generally, documents should not only expose what's on screen and e.g. tdf#96492 also suggests that off-screen paragraphs in Writer should be part of the a11y tree.
Would we run into a similar problem with Writer when exposing all off-screen pages/paragraphs in the a11y tree as well, e.g. if somebody opens a 1000 page document with thousands of paragraphs?

From what I have seen, the current handling of tables in LO wrt a11y is that all cells are treated as children and the corresponding child index is used in many places (for calculating row/col index,...), so changing that would probably be a larger change.

Without knowing much about how this is handled elsewhere, limiting what children to look into (iterate over?) on the client side (as the commit in https://gitlab.gnome.org/GNOME/at-spi2-core/-/issues/138 does) seems like a good option to me, at least in the short run. (In particular for the status bar example given here, descending into any children of objects with document roles would seem unnecessary.)

I'm thankful for further input on how to address this in practice.
Comment 3 Joanmarie Diggs 2023-08-08 08:51:48 UTC
I'm comment on the ARIA stuff later, but I wanted to respond to some of your comments now.

Normally all objects that might be of interest should indeed be in the accessibility tree. The fact that they are not in the case of, say, Writer, is why Orca users (and NVDA users I believe) cannot use Structural/Quick-Key navigation in documents. Please expose more -- not less -- in the accessibility tree.

I honestly don't know if thousands of paragraphs would grind the system to a halt or not. The change in Atspi sets the maximum number of children in descendable objects to 65536. I'll test later. If that makes the problem go away, my guess is thousands of paragraphs would still be ok. AtspiCollection is fast.

The Calc table has 2,147,483,647 children. That's a bit more than "thousands".

Regarding the status bar test. To be clear, that was just to force the bug to be reproducible. The changes I'm making to Orca will be to use AtspiCollection **any time the task is to find one or more descendants**. And Orca does that often enough -- including for its Flat Review feature which is notoriously non-performant. AtspiCollection will fix that. And Flat Review's purpose is to present on-screen objects, so documents are relevant.

Unfortunately, AtspiCollection does not currently have API that says "please avoid looking in documents/giant tables/whatever for stuff." All accessible objects are treated the same (modulo Mike Gorse's change last night).
Comment 4 cwendling 2023-08-08 08:53:41 UTC
That's indeed a complicated question, that has other implications as Michael says.

I'm happy to review my POV on not exposing everything in Writer if we change how ATs are supposed to use the interfaces and there are reasonable alternatives -- but I'm not even the one that initially reported the issue :)

In this specific case, as an AtspiTable has specific interface for querying children, it *might* indeed be OK to not have any children exposed there and require users to go through that interface querying number of rows and columns, and then specific ones (if one wants to iterate, one still can that way).  And I admit that there is a performance issue here with tables, even internal LO a11y tests have (and always had) a 500 children iteration limit, likely for those very tables.

However I'm with Michael here: we'd need to settle one what apps/toolkits are expected to report for what, because if we're having special cases that are more complex than implementing the APIs (like reporting children here), we need to have a spec for everyone to agree on what they should do or can expect.

It's a big well known issue currently that we lack clear documentation and guidance on how those APIs are supposed to be implemented and used, and I believe Joanie and… Matt maybe? have plans on writing some of that -- which is awesome.

But before we really know, I'm not sure what we can really do, apart from a hack as was done in that MR -- or is done in internal tests.  Or… couldn't the atk-adaptor code do something smart with tables, if there's something to be done?  It sounds like a good spot for it.  Or maybe we possibly could improve the situation implementing AtspiCollection in LO?  But ATM it's not really an option as ATK doesn't seem to have it, and the GTK3 VCL uses it.
Comment 5 Michael Weghorn 2023-08-08 11:03:46 UTC
Thanks for your replies!

(In reply to Joanmarie Diggs from comment #3)
> Normally all objects that might be of interest should indeed be in the
> accessibility tree. The fact that they are not in the case of, say, Writer,
> is why Orca users (and NVDA users I believe) cannot use Structural/Quick-Key
> navigation in documents. Please expose more -- not less -- in the
> accessibility tree.
That's what I understood as being the reasonable way to go before being uncertain when seeing this bug report, thanks for confirming.

> I honestly don't know if thousands of paragraphs would grind the system to a
> halt or not. The change in Atspi sets the maximum number of children in
> descendable objects to 65536. I'll test later. If that makes the problem go
> away, my guess is thousands of paragraphs would still be ok.
To be honest, from the LO side I'd actually be surprised if having LO process thousands of paragraphs wouldn't cause performance issues.

> AtspiCollection is fast.
I'm curious: Is that true for any application or only if apps/frameworks support AtspiCollection to actually optimize retrieving objects via that interface? Or does using AtspiCollection bring a performance benefit even otherwise, e.g. because libatspi or some other "lower layer" does something smarter other than falling back to just iterating over all children/the whole tree?

> The Calc table has 2,147,483,647 children. That's a bit more than
> "thousands".
In reality, it has even more children now with 16k column support, but that's the maximum that the 32-bit integers in the AT-SPI API are able to report, which is also why just using the child index to retrieve e.g. a cell in the last row wouldn't work and the TableCell interface is used instead (s. tdf#150683).

> Regarding the status bar test. To be clear, that was just to force the bug
> to be reproducible. The changes I'm making to Orca will be to use
> AtspiCollection **any time the task is to find one or more descendants**.
> And Orca does that often enough -- including for its Flat Review feature
> which is notoriously non-performant. AtspiCollection will fix that. And Flat
> Review's purpose is to present on-screen objects, so documents are relevant.
I see.

(In reply to cwendling from comment #4)
> However I'm with Michael here: we'd need to settle one what apps/toolkits
> are expected to report for what, because if we're having special cases that
> are more complex than implementing the APIs (like reporting children here),
> we need to have a spec for everyone to agree on what they should do or can
> expect.
> 
> It's a big well known issue currently that we lack clear documentation and
> guidance on how those APIs are supposed to be implemented and used, and I
> believe Joanie and… Matt maybe? have plans on writing some of that -- which
> is awesome.

Yes, having clear documentation on what apps/toolkits are supposed to do would really be great. It might be somewhat easy to *somehow* adapt LO to not hang on the attached script, but anything I can come up with so far would feel like a terrible hack that somewhat breaks the semantic model (and affects other interfaces like the Table interface) of how I currently understand tables to be modeled in LO...

> (...) Or maybe we possibly could
> improve the situation implementing AtspiCollection in LO?  But ATM it's not
> really an option as ATK doesn't seem to have it, and the GTK3 VCL uses it.

There's currently tdf#35654 for the lack of AtspiCollection support in Writer docs.

@Joanmarie: reading your tdf#35654 comment 10, I'm wondering what would have to be done on LO side to support that?
Comment 6 QA Administrators 2024-03-17 03:14:20 UTC Comment hidden (obsolete)
Comment 7 Joanmarie Diggs 2024-04-04 09:46:18 UTC
Michael: I'm not sure what exactly you're asking. Sorry!

There is not yet an AtkCollection. And if I had to guess, there won't be any time soon. And if there were one, the API would likely need to change. As a consumer of AtspiCollection, I find the API confusing, complicated, and still failing to do basic things I need it to do. And when I asked the Gtk4 developers to provide an implementation for AtspiCollection, they said "yuck" (and I paraphrase only slightly). ;)

So my response to:

     reading your tdf#35654 comment 10, I'm wondering what
     would have to be done on LO side to support that?

is a shrug I'm afraid.

If I'm failing to get the real question, please clarify.
Comment 8 Michael Weghorn 2024-04-05 07:59:17 UTC
(In reply to Joanmarie Diggs from comment #7)
> Michael: I'm not sure what exactly you're asking. Sorry!
> 
> There is not yet an AtkCollection. And if I had to guess, there won't be any
> time soon. And if there were one, the API would likely need to change. As a
> consumer of AtspiCollection, I find the API confusing, complicated, and
> still failing to do basic things I need it to do. And when I asked the Gtk4
> developers to provide an implementation for AtspiCollection, they said
> "yuck" (and I paraphrase only slightly). ;)
> 
> So my response to:
> 
>      reading your tdf#35654 comment 10, I'm wondering what
>      would have to be done on LO side to support that?
> 
> is a shrug I'm afraid.
> 
> If I'm failing to get the real question, please clarify.

Back then it was unclear to me whether LO was supporting AtspiCollection in the first place. Your tdf#35654 comment 19 clarified that this is the case (at least for the gtk3 VCL plugin), as that's implemented on the ATK level by now. Therefore, I've closed tdf#35654 now and mentioned the remaining issue in bug 96492 instead, in the hope to make this a bit clearer/easier to keep track.

Reading your above comment: Do you still think that AT-SPI Collection is the way forward in the longer run?

If so, the discussion in https://gitlab.gnome.org/GNOME/gtk/-/issues/6204 contains some more information/tips on what to do instead of exposing all cells in the a11y tree, but some points (like how to expose selected cells when part of them are off-screen) would IMHO still need clarification.