When using huge pictures, e.g. from modern digital sources (cameras, mobiles, ...) LO can get quite slow. This differs - of course - depending on Systems running on. The aspects influencing this are memory, processor (cores/architecture), 32/64bit, target graphic system interface, OS and evtl. support of graphics hardware.
In this task I'll try to take some into account on weak HW where we know no special Graphics-HW is available and also CPUs are not breathtakingly powerful (good test is to build LO). Taking care of this is one thing, but... - Need to take care on all possible scenarios - No use to optimize one case and kill a dozen others - Only valuable when not influencing other system-dependent parts - System-independent parts might not be prevented to use better stuff/paths in SW when available There are various possibilities to try to put the lever at. It is clear that quite some security aspects (in the sense of not killing other scenarios) and thus experimenting/testing will have to be done. It is quite possible that more than one and/or combinations of activities/changes will lead to success. There are also already some, which are more or less successful in diverse scenarios. I will number the things that come to my mind and I will investigate in and try to document possible enhancements/progress here. I will also show some which seem plausible but do not work well.
Short description of scenarios: (a) Add a huge bitmap (jpeg) of your choice to draw/impress/writer/calc and scroll/zoom around. (b) Same with Writer, but type text. This leads to slow scenarios due to the needed repaints. Interestingly there are already some scenarios where this works surprisingly well - Dragging something (e.g. huge transparent ellipse) in such a scenario -> works mostly well depending on app due to this happening on overlay and main view is not redrawn at all. - Typing text into a Shape -> nowadays also completely on Overlay, no problem. There was a task for this with impress and a huge BG-Bitmap that made me move the shape-text-editing to overlay completely. These may give hints for some problematic areas we have.
Looking closer at the repaint shows that it comes down to OutputDevice::DrawBitmap and this leads to Bitmap::Scale. This *will* be slow on huge Bitmaps. Idea (1): Just buffer/cache the scaled bitmaps: This is the brute-force one. Questionable is the number of Bitmaps to buffer (at some point this will out weight the wins), the initial scale (1st repaint), mem footprint (you have to stop somewhere - so there is always a scenario constructible where this fails - and you need an 'intelligent' administration) and the reuse at all. Experience shows that scale (zoom in/out) makes buffering an exactly scaled bitmap immediately useless. But also scrolling leads to calculating back from object coordinates over world coordinates to pixel coordinates and sizes. Some of these are double now, but not all. Not only numerical problems but also the exact orientation of the logic range of the bitmap and how it covers target pixels lead to slightly different pixel sizes of +/- single pixels. There are quite some places in LO already that try to deal with that and try to detect if the buffered one can be re-used or not - that stuff is expensive and unreliable. There is also no association of cached data to view-dependent representation which would be needed to intelligently get rid of that data when the view-dependent part goes away - in DraingLayer this is the ObjectContact/ViewObjectContact/ViewContact relationship which just for that reason would be a better place for this. Unfortunately not all apps use DrawingLayer and Primitives completely for repaint yet. A third problem is this tries to work with exact scales which as a consequence have to be done in high quality -> makes the scale even more expensive. Compared with being able to get 'close' but not smaller in pixels to the target size by just using a 2^n scaling (mip-map-like) and leave the quality part to the graphic sub-system. Luckily, we have a number of better possibilities in the existing graphics stack which may need to be combined but will/should lead to more scenario-independent improvements.
Some raw planned points I will experiment with (more info when getting into it): (2) DrawTransformedBitmapEx has evolved and should be used more often. (3) SysDep buffering of data in the format of the graphic sub-system. (4) Evtl. binary scale of this (2^n, mipMap-like) (5) Check maybe some primitive decompose are not reused currently (due to EditViews not completely using these yet - sigh)
Worked on (2) and achieved some progress. Advantage is that no scaling is done, but the sys-dep form created immediately (two times copy/touch/memstuff changed to one time). This seems safe - I did some tests/checks on various sytems/targetGrfSys, need more,...
Made an in-between step to create some timing information to be able to make more precise statements to the effects of the possible actions...
Checked changes for (2) and it already slightly to middle improves behaviour. It seems safe, I checked some scenarios (all apps, systems, ...). A Writer example already got quite faster, most cases were slightly faster, one case was slower. Testing with large and huge bitmaps, transparent and non-transparent. Needs more checks and the big step will come in combination with (3) I guess. Added a env var to switch that timing output helper on/off, added a static bool to allow change (2)-behaviour on the fly in the debugger for test purposes.
Working on (3) and making good progress. For now, apply to ::drawTransformedBitmap and do tests using the formally added timer output. Looks pretty good combined with (2) and I can now interactively check for problem zones. Doing right that and checking various scenarios... BTW: There are more places to buffer, e.g. I added a big, 8-bit BMP and ::drawBitmap was used. The following methods should be supported: ::drawAlphaBitmap ::drawTransformedBitmap ::drawBitmap ::drawMask what will increase work to do
Writer: To profit from (2), (3), will need to adapt paintGraphicUsingPrimitivesHelper to reuse primitives AFAP - maybe use VOC-mechanism from Writer_DL - we *have* a ObjectContact there. Need to check that.
Nice updated; thanks Armin =)
Adapting Writer's SwNoTextFrame to VOC-Mechanism was ambitioned, but is the precondition to get the involved Primitives - and their decomposition buffered. This is done in DrawingLayer automatically, so works for Draw/Impress/Calc alredy - Calc due to all Graphics being embedded in DrawingLayer. Also shows the importance to one day change Writer/Calc to full primitive drawing. To allow system-dependent buffering of the involved bitmaps it is necessary to re-use the involved primitives and their already executed decomposition (also for performance reasons). This is usually done in DrawingLayer by using the VOC-hanism (see descriptions elsewhere). To get that here, make the involved SwNoTextFrame (this) a ::contact::ViewContact supplier by supporing a GetViewContact() - call. For ObjectContact we can use // the already exising ObjectContact from the involved DrawingLayer. For tis, the helper classes ViewObjectContactOfSwNoTextFrame ViewContactOfSwNoTextFrame are created which support the VOC-mechanism in it's minimal form. This allows automatic and view-dependent (multiple edit windows, print, etc.) re-use of the created primitives. Also: Will be very useful when completely changing the Writer repaint to VOC and Primitives, too.
Unified usage and adapted ::drawAlphaBitmap ::drawTransformedBitmap ::drawBitmap but not ::drawMask due to some special stuff going on there - unpremultiply of data. I am not sure if/how this needs to be done. Probably only once, thus would need to be part of 1st init and buffer creation - but what if something different happens every time...? Too unsafe, dispense buffering for now.
Works well with (2), (3) for Cairo and (5) for Writer repaints. Doing some more tests and refinements...
Good that I tested again - there was an error/mismatch in ::drawAlphaBitmap that ruined transparent bitmap paint. Continue checks...
Note: Speed(up) is pretty good. Can be interactively checked using SAL_ENABLE_TIMER_BITMAPDRAW and DO_TIME_TEST. Using this it can be seen that it is *very* dependent of zoom and thus relation of bitmap pixels to visualization pixels. It is very good when zooing far in and zooming far out. It is better in-between, but there is the worst point. This could further be improved using (4) Evtl. binary scale of this (2^n, mipMap-like) directly inside creating the system-dependent graphics representation and checking against this in the re-usage-accesses.
Gerrit linux/gcc has a crash on CppunitTest_desktop_lib, it's in SalGraphics::DrawTransformedBitmap with a this* of 0x00000000 - argh. The OutputDevice it's called from has mpGraphics -> 0x00000000. It's a SD-redraw, stack is: libvcllo.so!SalGraphics::DrawTransformedBitmap(SalGraphics * const this, const basegfx::B2DPoint & rNull, const basegfx::B2DPoint & rX, const basegfx::B2DPoint & rY, const SalBitmap & rSourceBitmap, const SalBitmap * pAlphaBitmap, const OutputDevice * pOutDev) (/mnt/aa6fce82-4224-4a6a-9754-cf36b5fee424/lo/work01/vcl/source/gdi/salgdilayout.cxx:926) libvcllo.so!OutputDevice::DrawTransformBitmapExDirect(OutputDevice * const this, const basegfx::B2DHomMatrix & aFullTransform, const BitmapEx & rBitmapEx) (/mnt/aa6fce82-4224-4a6a-9754-cf36b5fee424/lo/work01/vcl/source/outdev/bitmap.cxx:1068) libvcllo.so!OutputDevice::DrawTransformedBitmapEx(OutputDevice * const this, const basegfx::B2DHomMatrix & rTransformation, const BitmapEx & rBitmapEx) (/mnt/aa6fce82-4224-4a6a-9754-cf36b5fee424/lo/work01/vcl/source/outdev/bitmap.cxx:1222) libdrawinglayerlo.so!drawinglayer::processor2d::VclProcessor2D::RenderBitmapPrimitive2D(drawinglayer::processor2d::VclProcessor2D * const this, const drawinglayer::primitive2d::BitmapPrimitive2D & rBitmapCandidate) (/mnt/aa6fce82-4224-4a6a-9754-cf36b5fee424/lo/work01/drawinglayer/source/processor2d/vclprocessor2d.cxx:368) Saw that OutputDevice::DrawTransformedBitmapEx compared to other public methods of OutputDevice has no test of form if ( !mpGraphics && !AcquireGraphics() ) return; thus - adding it
Fixes the crash. One goal was to use OutputDevice::DrawTransformedBitmapEx more often, so that this test was missing may have been never detected yet.
Armin Le Grand (Collabora) committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/828504974d70111e4a35b31d579cf42fe660a660 tdf#130768 speedup huge pixel graphics Cairo It will be available in 7.0.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Detected an error in the Office's preview pane after start, the pics are painted with an offset. This is due to using DrawTransformBitmapExDirect more often which uses GetViewTransformation() to get to pixel coordinates. This would - usually - be okay, but there are the mnOutOffX/mnOutOffY members of OutputDevice which seem to be used as an internal Windows offset mapping since 'fake' windows are used. For that purpose there is ImplGetDeviceTransformation() which takes that into account. It is a kind'a GetViewTransformation() which corrects for these values - if set. These *seem* to be set/used not very often, but have to be taken into account - argh!
Armin Le Grand (Collabora) committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/f1d6788fe1767f97e3ca2c67c7415f8c18c3d618 tdf#130768 need to use mnOutOffX/mnOutOffY It will be available in 7.0.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
As per mail; I only skimmed the code - but it is far from clear to me how we avoid doing the re-interpolation and affine scaling of the image repeatedly inside cairo / libpixman. My goal here was to have a cache that avoided the repeated, expensive scaling of images on slow (particularly mobile) CPUs. We used to do that threaded 4-8x wide on many Android phones with muscle, is it possible that we just switched to using pixman - assuming it is fast, though it is single a threaded CPU renderer ? I would expect to have a scaled / interpolated version of the surface cached that is at device resolution, orientation etc. and to blit that with no transform on subsequent renders.
@Michael: Sorry if this does not meet your expectations. I communicated the concept early, and also the time frames involved. Steps (2), (3) and (5) are done. The reduction in data is step (4) which is in progress. For a complete solution for HW-weak systems step (4) is needed. It can be an exact BM-Scale as you talk about, but a log(2) and Mip-Map like approach has some advantages: - Only one pyramid of on-demand buffered reduced, easy-to-create smaller instances, preferred direct in system-dependent data structure (cairo_surface_t e.g. for Cairo) - Easy-to-create due to no need for filtering when creating reduced instances with log(2) stepping, also easy parallelizable - A single instance of that stack will serve all multiple views/usages (a brute-force pixel-size LRU-cache would need one exact scaled instance per paint per view - n views, n copies. Reusage would be a coincidence. other method never will need more than log(size) copies) - More invariant for slight scaling. This includes the unavoidable one-pixel error in translating from logic to pixel coordinates. Handling this in a LRU is tricky, using it leads to ugly not-painted single lines right and bottom e.g. when bitmaps are aligned or have a frame - also refer to comment 3 for this - Supports multiple views with different zoom settings: By using just one instance of that stack. If any view changes zoom, potentially (due to on-demand) no new buffer object needs to be created - ScaleObject in multiple views: If size of bitmap changes, no new buffers need to be created for any view. in LRU method, *all* views would have to re-create a new, exactly-scaled version -> worst case - and have to be lucky in LRU that the no longer needed will vanish All this for the cost of - worst case with log(2) scaled stack - double the target size minus one pixel bitmap to be directly scaled/rendered from the target system. Comparing the advantages and disadvantages of both I clearly opt for the log(2) - Mip-Map like one. This also makes use of the already existing view-dependent buffered primitive decompositions - if working. Another plus: Due to clear view-dependencies the no longer needed buffers can be removed exactly when not needed anymore - in parallel to doing this on mem footprint and touch/usageCount. Another point: When optimizing this for HW-weak systems we potentially blame and break-out systems which *have* HW-support, so we probably will need some runtime-detection if/when this shall be used, too.
Before I can continue with (4) I have to fix an error in tiled rendering. The helper class RenderContextGuard in sw unfortunately does things wrong - it's goal is to 'patch' a custom OutputDevice as render target to the Writer repaint, but the way it is none now deletes the current OC and thus kills all buffered primitive decompositions that were done/created in the last rendering. With other words - it kills buffering based on VOC and primitive usage. Interestingly, this does not happen for Draw/Impress/Calc. This leads to quite expensive unnecessary repeats, e.g Bitmap::Adjust for in any way parametrisized Bitmaps (Alpha/red, ...), but also to re-creation of 3D scenes and re-creation of chart contents. Fix that by using already existing stuff - use patchPaintWindow/unpatchPaintWindow instead. To make complete, adapt SdrPaintView::FindPaintWindow to find the patched SdrPaintWindow in all situations it was found before.
Armin Le Grand committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/424312aa99307da9f0ee60ea6e3213b2b3dc26b4 tdf#130768 Make tiled writer paint reuse decomposes It will be available in 7.0.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Armin Le Grand committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/core/commit/7c0378c0bea935c0aac2f519c53c30b2e4d8bbf9 tdf#130768 add a pre-scale version for cairo It will be available in 7.0.0. The patch should be included in the daily builds available at https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: https://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Done point (4) in the last commit, so complete for now. Things that might be added: - checks for automatically ac/de/activating aspects of this. Using now a decent balance that should work well on all scalings - point (4) may easily replaced by just exact scaling to needed target size - with the costs for mem-footprint and the risks of re-usage mayhem (see comment 3) - I would not recommend that...
Done for Cairo so far. Open and possible: - Do for other backends - Add a detector to measure if pre-scaled should be activated
Associated performance regression in bug#138068
Timur, bug 137719 and bug 138068 are already associated with Regressions-cairo-speedup, no need to add them here.