130768 – Paint sometimes slow when huge Pixel-Oriented graphics involved

Bug 130768 - Paint sometimes slow when huge Pixel-Oriented graphics involved

Summary: Paint sometimes slow when huge Pixel-Oriented graphics involved

Status:	ASSIGNED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	graphics stack (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium enhancement
Assignee:	Not Assigned

URL:
Whiteboard:	target:7.0.0
Keywords:	perf

Depends on:
Blocks:

Reported:	2020-02-19 08:52 UTC by Armin Le Grand (allotropia)
Modified:	2022-02-21 09:29 UTC (History)
CC List:	5 users (show)

See Also:	Regressions-cairo-speedup
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Armin Le Grand (allotropia) 2020-02-19 08:52:18 UTC

When using huge pictures, e.g. from modern digital sources (cameras, mobiles, ...) LO can get quite slow. This differs - of course - depending on Systems running on. The aspects influencing this are memory, processor (cores/architecture), 32/64bit, target graphic system interface, OS and evtl. support of graphics hardware.

Comment 1 Armin Le Grand (allotropia) 2020-02-19 09:00:53 UTC

In this task I'll try to take some into account on weak HW where we know no special Graphics-HW is available and also CPUs are not breathtakingly powerful (good test is to build LO).
Taking care of this is one thing, but...
- Need to take care on all possible scenarios
- No use to optimize one case and kill a dozen others
- Only valuable when not influencing other system-dependent parts
- System-independent parts might not be prevented to use better stuff/paths in SW when available

There are various possibilities to try to put the lever at. It is clear that quite some security aspects (in the sense of not killing other scenarios) and thus experimenting/testing will have to be done. It is quite possible that more than one and/or combinations of activities/changes will lead to success. There are also already some, which are more or less successful in diverse scenarios. I will number the things that come to my mind and I will investigate in and try to document possible enhancements/progress here. I will also show some which seem plausible but do not work well.

Comment 2 Armin Le Grand (allotropia) 2020-02-19 09:07:42 UTC

Short description of scenarios:
(a) Add a huge bitmap (jpeg) of your choice to draw/impress/writer/calc and scroll/zoom around.
(b) Same with Writer, but type text.
This leads to slow scenarios due to the needed repaints. Interestingly there are already some scenarios where this works surprisingly well
- Dragging something (e.g. huge transparent ellipse) in such a scenario -> works mostly well depending on app due to this happening on overlay and main view is not redrawn at all.
- Typing text into a Shape -> nowadays also completely on Overlay, no problem. There was a task for this with impress and a huge BG-Bitmap that made me move the shape-text-editing to overlay completely.
These may give hints for some problematic areas we have.

Comment 3 Armin Le Grand (allotropia) 2020-02-19 09:25:44 UTC

Looking closer at the repaint shows that it comes down to OutputDevice::DrawBitmap and this leads to Bitmap::Scale. This *will* be slow on huge Bitmaps.

Idea (1): Just buffer/cache the scaled bitmaps:
This is the brute-force one. Questionable is the number of Bitmaps to buffer (at some point this will out weight the wins), the initial scale (1st repaint), mem footprint (you have to stop somewhere - so there is always a scenario constructible where this fails - and you need an 'intelligent' administration) and the reuse at all. Experience shows that scale (zoom in/out) makes buffering an exactly scaled bitmap immediately useless. But also scrolling leads to calculating back from object coordinates over world coordinates to pixel coordinates and sizes. Some of these are double now, but not all. Not only numerical problems but also the exact orientation of the logic range of the bitmap and how it covers target pixels lead to slightly different pixel sizes of +/- single pixels. There are quite some places in LO already that try to deal with that and try to detect if the buffered one can be re-used or not - that stuff is expensive and unreliable.
There is also no association of cached data to view-dependent representation which would be needed to intelligently get rid of that data when the view-dependent part goes away - in DraingLayer this is the ObjectContact/ViewObjectContact/ViewContact relationship which just for that reason would be a better place for this. Unfortunately not all apps use DrawingLayer and Primitives completely for repaint yet.
A third problem is this tries to work with exact scales which as a consequence have to be done in high quality -> makes the scale even more expensive. Compared with being able to get 'close' but not smaller in pixels to the target size by just using a 2^n scaling (mip-map-like) and leave the quality part to the graphic sub-system.

Luckily, we have a number of better possibilities in the existing graphics stack which may need to be combined but will/should lead to more scenario-independent improvements.

Comment 4 Armin Le Grand (allotropia) 2020-02-19 09:30:11 UTC

Some raw planned points I will experiment with (more info when getting into it):

(2) DrawTransformedBitmapEx has evolved and should be used more often.
(3) SysDep buffering of data in the format of the graphic sub-system.
(4) Evtl. binary scale of this (2^n, mipMap-like)
(5) Check maybe some primitive decompose are not reused currently (due to EditViews not completely using these yet - sigh)

Comment 5 Armin Le Grand (allotropia) 2020-02-19 21:20:41 UTC

Worked on (2) and achieved some progress. Advantage is that no scaling is done, but the sys-dep form created immediately (two times copy/touch/memstuff changed to one time). This seems safe - I did some tests/checks on various sytems/targetGrfSys, need more,...

Comment 6 Armin Le Grand (allotropia) 2020-02-20 09:30:20 UTC

Made an in-between step to create some timing information to be able to make more precise statements to the effects of the possible actions...

Comment 7 Armin Le Grand (allotropia) 2020-02-20 11:06:43 UTC

Checked changes for (2) and it already slightly to middle improves behaviour. It seems safe, I checked some scenarios (all apps, systems, ...). A Writer example already got quite faster, most cases were slightly faster, one case was slower. Testing with large and huge bitmaps, transparent and non-transparent. Needs more checks and the big step will come in combination with (3) I guess. Added a env var to switch that timing output helper on/off, added a static bool to allow change (2)-behaviour on the fly in the debugger for test purposes.

Comment 8 Armin Le Grand (allotropia) 2020-02-20 15:13:20 UTC

Working on (3) and making good progress. For now, apply to ::drawTransformedBitmap and do tests using the formally added timer output. Looks pretty good combined with (2) and I can now interactively check for problem zones. Doing right that and checking various scenarios...
BTW: There are more places to buffer, e.g. I added a big, 8-bit BMP and 

::drawBitmap was used. The following methods should be supported:
::drawAlphaBitmap
::drawTransformedBitmap
::drawBitmap
::drawMask

what will increase work to do

Comment 9 Armin Le Grand (allotropia) 2020-02-20 15:22:06 UTC

Writer: To profit from (2), (3), will need to adapt paintGraphicUsingPrimitivesHelper to reuse primitives AFAP - maybe use VOC-mechanism from Writer_DL - we *have* a ObjectContact there. Need to check that.

Comment 10 Michael Meeks 2020-02-20 21:48:12 UTC

Nice updated; thanks Armin =)

Comment 11 Armin Le Grand (allotropia) 2020-02-21 11:38:01 UTC

Adapting Writer's SwNoTextFrame to VOC-Mechanism was ambitioned, but is the precondition to get the involved Primitives - and their decomposition buffered. This is done in DrawingLayer automatically, so works for Draw/Impress/Calc alredy - Calc due to all Graphics being embedded in DrawingLayer. Also shows the importance to one day change Writer/Calc to full primitive drawing.
To allow system-dependent buffering of the involved bitmaps it is necessary to re-use the involved primitives and their already executed decomposition (also for performance reasons). This is usually done in DrawingLayer by using the VOC-hanism (see descriptions elsewhere).
To get that here, make the involved SwNoTextFrame (this) a ::contact::ViewContact supplier by supporing a GetViewContact() - call. For ObjectContact we can use
// the already exising ObjectContact from the involved DrawingLayer. For tis, the helper classes
     ViewObjectContactOfSwNoTextFrame
     ViewContactOfSwNoTextFrame
are created which support the VOC-mechanism in it's minimal form. This allows automatic and view-dependent (multiple edit windows, print, etc.) re-use of the created primitives. Also: Will be very useful when completely changing the Writer repaint to VOC and Primitives, too.

Comment 12 Armin Le Grand (allotropia) 2020-02-21 12:30:08 UTC

Unified usage and adapted

::drawAlphaBitmap
::drawTransformedBitmap
::drawBitmap

but not

::drawMask

due to some special stuff going on there - unpremultiply of data. I am not sure if/how this needs to be done. Probably only once, thus would need to be part of 1st init and buffer creation - but what if something different happens every time...? Too unsafe, dispense buffering for now.

Comment 13 Armin Le Grand (allotropia) 2020-02-21 14:49:04 UTC

Works well with (2), (3) for Cairo and (5) for Writer repaints. Doing some more tests and refinements...

Comment 14 Armin Le Grand (allotropia) 2020-02-21 15:45:42 UTC

Good that I tested again - there was an error/mismatch in ::drawAlphaBitmap that ruined transparent bitmap paint. Continue checks...

Comment 15 Armin Le Grand (allotropia) 2020-02-21 16:58:10 UTC

Note: Speed(up) is pretty good. Can be interactively checked using SAL_ENABLE_TIMER_BITMAPDRAW and DO_TIME_TEST. Using this it can be seen that it is *very* dependent of zoom and thus relation of bitmap pixels to visualization pixels. It is very good when zooing far in and zooming far out. It is better in-between, but there is the worst point.
This could further be improved using

(4) Evtl. binary scale of this (2^n, mipMap-like)

directly inside creating the system-dependent graphics representation and checking against this in the re-usage-accesses.

Comment 16 Armin Le Grand (allotropia) 2020-02-21 18:12:28 UTC

Gerrit linux/gcc has a crash on CppunitTest_desktop_lib, it's in SalGraphics::DrawTransformedBitmap with a this* of 0x00000000 - argh. The OutputDevice it's called from has mpGraphics -> 0x00000000. It's a SD-redraw, stack is:

libvcllo.so!SalGraphics::DrawTransformedBitmap(SalGraphics * const this, const basegfx::B2DPoint & rNull, const basegfx::B2DPoint & rX, const basegfx::B2DPoint & rY, const SalBitmap & rSourceBitmap, const SalBitmap * pAlphaBitmap, const OutputDevice * pOutDev) (/mnt/aa6fce82-4224-4a6a-9754-cf36b5fee424/lo/work01/vcl/source/gdi/salgdilayout.cxx:926)
libvcllo.so!OutputDevice::DrawTransformBitmapExDirect(OutputDevice * const this, const basegfx::B2DHomMatrix & aFullTransform, const BitmapEx & rBitmapEx) (/mnt/aa6fce82-4224-4a6a-9754-cf36b5fee424/lo/work01/vcl/source/outdev/bitmap.cxx:1068)
libvcllo.so!OutputDevice::DrawTransformedBitmapEx(OutputDevice * const this, const basegfx::B2DHomMatrix & rTransformation, const BitmapEx & rBitmapEx) (/mnt/aa6fce82-4224-4a6a-9754-cf36b5fee424/lo/work01/vcl/source/outdev/bitmap.cxx:1222)
libdrawinglayerlo.so!drawinglayer::processor2d::VclProcessor2D::RenderBitmapPrimitive2D(drawinglayer::processor2d::VclProcessor2D * const this, const drawinglayer::primitive2d::BitmapPrimitive2D & rBitmapCandidate) (/mnt/aa6fce82-4224-4a6a-9754-cf36b5fee424/lo/work01/drawinglayer/source/processor2d/vclprocessor2d.cxx:368)

Saw that OutputDevice::DrawTransformedBitmapEx compared to other public methods of OutputDevice has no test of form

    if ( !mpGraphics && !AcquireGraphics() )
        return;

thus - adding it

Comment 17 Armin Le Grand (allotropia) 2020-02-21 18:17:48 UTC

Fixes the crash. One goal was to use OutputDevice::DrawTransformedBitmapEx more often, so that this test was missing may have been never detected yet.

Comment 18 Commit Notification 2020-02-21 19:18:08 UTC

Armin Le Grand (Collabora) committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/828504974d70111e4a35b31d579cf42fe660a660

tdf#130768 speedup huge pixel graphics Cairo

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 19 Armin Le Grand (allotropia) 2020-02-22 15:37:45 UTC

Detected an error in the Office's preview pane after start, the pics are painted with an offset. This is due to using DrawTransformBitmapExDirect more often which uses

GetViewTransformation()

to get to pixel coordinates. This would - usually - be okay, but there are the mnOutOffX/mnOutOffY members of OutputDevice which seem to be used as an internal Windows offset mapping since 'fake' windows are used.
For that purpose there is ImplGetDeviceTransformation() which takes that into account. It is a kind'a GetViewTransformation() which corrects for these values - if set. These *seem* to be set/used not very often, but have to be taken into account - argh!

Comment 20 Commit Notification 2020-02-22 16:44:29 UTC

Armin Le Grand (Collabora) committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/f1d6788fe1767f97e3ca2c67c7415f8c18c3d618

tdf#130768 need to use mnOutOffX/mnOutOffY

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 21 Michael Meeks 2020-02-25 12:09:01 UTC

As per mail; I only skimmed the code - but it is far from clear to me how we avoid doing the re-interpolation and affine scaling of the image repeatedly inside cairo / libpixman.

My goal here was to have a cache that avoided the repeated, expensive scaling of images on slow (particularly mobile) CPUs. We used to do that threaded 4-8x wide on many Android phones with muscle, is it possible that we just switched to using pixman - assuming it is fast, though it is single a threaded CPU renderer ?

I would expect to have a scaled / interpolated version of the surface cached that is at device resolution, orientation etc. and to blit that with no transform on subsequent renders.

Comment 22 Armin Le Grand (allotropia) 2020-02-27 10:08:33 UTC

@Michael: Sorry if this does not meet your expectations. I communicated the concept early, and also the time frames involved. Steps (2), (3) and (5) are done. The reduction in data is step (4) which is in progress.

For a complete solution for HW-weak systems step (4) is needed. It can be an exact BM-Scale as you talk about, but a log(2) and Mip-Map like approach has some advantages:

- Only one pyramid of on-demand buffered reduced, easy-to-create smaller instances, preferred direct in system-dependent data structure (cairo_surface_t e.g. for Cairo)

- Easy-to-create due to no need for filtering when creating reduced instances with log(2) stepping, also easy parallelizable

- A single instance of that stack will serve all multiple views/usages (a brute-force pixel-size LRU-cache would need one exact scaled instance per paint per view - n views, n copies. Reusage would be a coincidence. other method never will need more than log(size) copies)

- More invariant for slight scaling. This includes the unavoidable one-pixel error in translating from logic to pixel coordinates. Handling this in a LRU is tricky, using it leads to ugly not-painted single lines right and bottom e.g. when bitmaps are aligned or have a frame - also refer to comment 3 for this

- Supports multiple views with different zoom settings: By using just one instance of that stack. If any view changes zoom, potentially (due to on-demand) no new buffer object needs to be created

- ScaleObject in multiple views: If size of bitmap changes, no new buffers need to be created for any view. in LRU method, *all* views would have to re-create a new, exactly-scaled version -> worst case - and have to be lucky in LRU that the no longer needed will vanish

All this for the cost of - worst case with log(2) scaled stack - double the target size minus one pixel bitmap to be directly scaled/rendered from the target system.

Comparing the advantages and disadvantages of both I clearly opt for the log(2) - Mip-Map like one. This also makes use of the already existing view-dependent buffered primitive decompositions - if working. Another plus: Due to clear view-dependencies the no longer needed buffers can be removed exactly when not needed anymore - in parallel to doing this on mem footprint and touch/usageCount.

Another point: When optimizing this for HW-weak systems we potentially blame and break-out systems which *have* HW-support, so we probably will need some runtime-detection if/when this shall be used, too.

Comment 23 Armin Le Grand (allotropia) 2020-02-27 15:30:45 UTC

Before I can continue with (4) I have to fix an error in tiled rendering. The helper class RenderContextGuard in sw unfortunately does things wrong - it's goal is to 'patch' a custom OutputDevice as render target to the Writer repaint, but the way it is none now deletes the current OC and thus kills all buffered primitive decompositions that were done/created in the last rendering. With other words - it kills buffering based on VOC and primitive usage. Interestingly, this does not happen for Draw/Impress/Calc.
This leads to quite expensive unnecessary repeats, e.g Bitmap::Adjust for in any way parametrisized Bitmaps (Alpha/red, ...), but also to re-creation of 3D scenes and re-creation of chart contents.
Fix that by using already existing stuff - use patchPaintWindow/unpatchPaintWindow instead. To make complete, adapt SdrPaintView::FindPaintWindow to find the patched SdrPaintWindow in all situations it was found before.

Comment 24 Commit Notification 2020-02-27 17:13:35 UTC

Armin Le Grand committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/424312aa99307da9f0ee60ea6e3213b2b3dc26b4

tdf#130768 Make tiled writer paint reuse decomposes

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 25 Commit Notification 2020-02-29 13:43:36 UTC

Armin Le Grand committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/7c0378c0bea935c0aac2f519c53c30b2e4d8bbf9

tdf#130768 add a pre-scale version for cairo

It will be available in 7.0.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 26 Armin Le Grand (allotropia) 2020-02-29 13:47:06 UTC

Done point (4) in the last commit, so complete for now. Things that might be added:
- checks for automatically ac/de/activating aspects of this. Using now a decent balance that should work well on all scalings
- point (4) may easily replaced by just exact scaling to needed target size - with the costs for mem-footprint and the risks of re-usage mayhem (see comment 3) - I would not recommend that...

Comment 27 Armin Le Grand (allotropia) 2020-03-10 09:10:12 UTC

Done for Cairo so far. Open and possible:
- Do for other backends
- Add a detector to measure if pre-scaled should be activated

Comment 28 Michael Meeks 2021-02-08 13:19:53 UTC

Associated performance regression in bug#138068

Comment 29 Aron Budea 2021-03-27 14:56:02 UTC

Timur, bug 137719 and bug 138068 are already associated with Regressions-cairo-speedup, no need to add them here.