| Summary: | MAILMERGE: Use IDocumentMarkAccess::UNO_BOOKMARK to mark end of one mail merge part | ||
|---|---|---|---|
| Product: | LibreOffice | Reporter: | Björn Michaelsen <bjoern.michaelsen> |
| Component: | Writer | Assignee: | Not Assigned <libreoffice-bugs> |
| Status: | RESOLVED FIXED | ||
| Severity: | enhancement | CC: | cno, glogow, l.lunak, thumperward |
| Priority: | low | ||
| Version: | Inherited From OOo | ||
| Hardware: | Other | ||
| OS: | All | ||
| See Also: | https://issues.apache.org/ooo/show_bug.cgi?id=72820 | ||
| Whiteboard: | lhm-limux target:4.4.0 | ||
| Crash report or crash signature: | Regression By: | ||
| Bug Depends on: | |||
| Bug Blocks: | 56355, 79067 | ||
| Attachments: |
hot spot function tree of a complex, 3000 documents mail merge via UNO.
3000 documents MM callgrind, part 1 3000 documents MM callgrind, part 2 |
||
|
Description
Björn Michaelsen
2014-07-02 17:19:01 UTC
Created attachment 102189 [details]
hot spot function tree of a complex, 3000 documents mail merge via UNO.
This is from my initial callgrind of a mail merge of 3000 documents with our real, primary document template, where the examples documents from #79067 are based on.
The callgrind has 4.5 MB and I don't have a place to store it. Probably someone can offer storage ;-) So the callgrind would "just" be a reference, I can't supply the documents, but it shows the hot spots. The callgrind ran almost a whole weekend.
FYI: the document has 19 "special frames", that's why mail merge will stop after 3450 documents (16bit index used for the vector). Me / a collegue will work on a patch for that, but it's currently not our primary concern, as MM produces broken results for various examples documents. Fixes for most are in private/jmux/mailmerge-fixes.
There are various patches related to MM perforamnce in private/jmux/sorted-pagedesc+spzfrmfmts
= The hot spots =
1. AppendAllObjs
This algorithm was probably written for a linked list, where removing the first element is cheap. For vectors it's ridiculously expensive, especially for vectors with 65000 objects...
Working fix in private/jmux/sorted-pagedesc+spzfrmfmts
2. lcl_GetUniqueFlyName
These code has two problems:
2.1 Creates a lot of temporary OUStrings (billions!) when doing atoi.
- Actually most time is spend allocation and freeing the substrings!
- Fix 1: directly try to convert the tail.
2.2 Has to check all entries in SpzFrmFmts. Most are invalid.
- Fix 2: Sorting this array will help it a lot.
- Fix 3: Storing the first hole when deleteing would even speed it up.
Broken fix 1 + 2 are in private/jmux/sorted-pagedesc+spzfrmfmts
With thsese patches, both hot spots are gone.
= TODO =
Undo for (vector) lists is implemented by storing the current position. For sorted lists undo has to store the index of the new object. Not really hard to fix, but I need more knowledge of the undo code to finally fix it.
Even for my 3000 pages doc, CalcLayout is far from any hot spot (MergeNew = 55%, CalcLayout 2.5%). I didn't yet look at the CalcLayout, but I hope it's possible to change the code to add an SwNode index, to tell it to just "render" the appended SwNodes (nodes after the index). But yeah - actually the number of pages could just be taken from the current working copy, but I guess this would confuse the copy of page anchored frames. I had to add an additional CalcLayout to the new SwDoc::Append code as a workaround for my "delete first node" problem, as I was visually loosing a page anchored node in the displayed document, while the saved document was correct. This is just a workaround in my branch, until I find the real problem in the drawing layer. Created attachment 102475 [details]
3000 documents MM callgrind, part 1
Created attachment 102476 [details]
3000 documents MM callgrind, part 2
(In reply to comment #2) > Even for my 3000 pages doc, CalcLayout is far from any hot spot (MergeNew = > 55%, CalcLayout 2.5%). Well, I callgrinded too, but with an artificial doc (just some plain one with a few enumerations and numberings[1]), and saw CalcLayout on top there. Ultimately though, doing _any_ layout work with each insertion is broken by design and has to be very slow for large merges. So killing dependencies on that is needed anyway, for to have a mail merge at some point that can do the layout _once_ at the end of the merge. As such, not using hardcoded page numbers is a good starting point IMHO. I think your callgrind is very likely for simple documents without headers, footers and tables. I really would like to prevent layouting the whole document, when we just append some pages. We need the page numbers for page-anchored flys. The internal structure uses the absolute page number as the reference. So if you have a MM document with 3 pages and a page anhored Fly on the second page, the anchor is bound to page 6 (3 pages + double side hidden empty page + 2). And to know, if the layout has still 3 pages after replacing the page numbers, you have to run the layouting. Luboš Luňák committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=a305a2c91420652db450b7f8edd140e1d69f42cf use bookmarks to mark mailmerge parts in a mailmerge document (fdo#80823) It will be available in 4.4.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback. I've implemented using UNO bookmarks instead of page numbers in master, so let's close this. |