Description: Often issues arise with specific documents, esp. performance issues. It will help LibreOffice QA and developers when triaging issues, if there is a overview of what might be special about one specific document. This EasyHack is to create a script or LibreOffice extension that provides statistics about a document. For a text document including for example: - number of paragraphs - number of pages - number of images/embedded media - number of changetracking (redlines) - number of styles, bookmarks, tables, indexes, text frames, OLE objects, sections, hyperlinks, references, comments ... The extension should produce the output as simple text, so that this can be easily copypasted into a bugreport. For other document types, other information might be relevant. For a simple scope, it should be ok to start with basic numbers about text documents. Steps to Reproduce: . Actual Results: . Expected Results: . Reproducible: Always User Profile Reset: No Additional Info:
Hi, I'm Anuj Agrawal. I'd like to work on this issue. Can you please elaborate on the format of the text output you wish the Script to generate?
Created attachment 150623 [details] The attachment contains my easy hack to the document anayser for a Libreoffice Document
Hi Anuj Agarwal, Any update on the bug?
I took a look at ipshii1609@gmail.com 's solution and I think that it's both incomplete and written in Python. I would like to work on it. Any objections?
Can somebody please provide any update on this bug. One user did provide a script written in python. Is this bug still open. If yes then please do speak in the context of the mentioned script.
Hi, I would like to start working on this bug. Wish me luck! Piya
It seems Piya abandoned this, so unassigning.
Created attachment 159166 [details] Script for counting elements in *.odt documents Hello everyone! I fixed some errors in the script from ipshii1609@gmail.com : - No counting of tables - No counting of images Also rewrote it, to make future additions possible. To be done: - Adding more category's - Fixing page counting when doing manual page breaks, or finding a proper way to count pages. Will try to add more stuff in the near future. Greetings
Created attachment 168089 [details] Document analyser Here is my attempt: I modified Sebastian's function and extended the script to include the remaining document statistics. In total, the script outputs: bookmark count, cell count, changetracking count, character count, comment count, draw count, frame count, hyperlink count, image count, non-whitespace character count, object count, OLE object count, page count, paragraph count, row count, sentence count, syllable count, table count, textbox count, word count, and paragraph styles. Additionally, the script can be run on other than *.odt files. Please let me know what I can do to extend this.
Comment on attachment 168089 [details] Document analyser >""" >Document analyser uses the odfpy module: https://pypi.org/project/odfpy/ > >This script prints: >bookmark count, cell count, changetracking count, character count, >comment count, draw count, frame count, hyperlink count, >image count, non-whitespace character count, object count, OLE object count, >page count, paragraph count, row count, sentence count, >syllable count, table count, textbox count, word count, and paragraph styles. > >""" > >import odf >from odf.namespaces import TEXTNS >from odf.element import Element >from odf.opendocument import load >from odf import text,meta,office,draw > >print("Enter filename: ") >filename=input() > >doc=load(filename) > >print("\nDOCUMENT STATISTICS\n") >for stat in doc.getElementsByType(meta.DocumentStatistic): > print("Cell count",stat.getAttribute('cellcount')) > print("Character count:",stat.getAttribute('charactercount')) > print("Draw count:",stat.getAttribute('drawcount')) > print("Frame count:",stat.getAttribute('framecount')) > print("Image count:",stat.getAttribute('imagecount')) > print("Non-whitespace character count:",stat.getAttribute('nonwhitespacecharactercount')) > print("Object count:",stat.getAttribute('objectcount')) > print("Object linking and embedding (OLE) object count:",stat.getAttribute('oleobjectcount')) > print("Page count:",stat.getAttribute('pagecount')) > print("Paragraph count:",stat.getAttribute('paragraphcount')) > print("Row count:",stat.getAttribute('rowcount')) > print("Sentence count:",stat.getAttribute('sentencecount')) > print("Syllable count:",stat.getAttribute('syllablecount')) > print("Table count:",stat.getAttribute('tablecount')) > print("Word count:",stat.getAttribute('wordcount')) > >#type counter for attributes not covered by odf.meta.DocumentStatistic >def type_counter(doc,type): > count=0 > for element in doc.getElementsByType(type): > count+=1 > return count > >types={ > 'Bookmark':text.Bookmark, > 'Changetracking':text.FormatChange, > 'Comment':office.Annotation, > 'Hyperlink':text.A, > 'Textbox':draw.TextBox >} > >for key,value in types.items(): > print(key,'count:',type_counter(doc,value)) > >def paragraph_style(doc): > i = 1 > for paragraph in doc.getElementsByType(text.P): > print('Paragraph',i,'style:',paragraph.getAttribute('stylename')) > i+=1 > >paragraph_style(doc)
Created attachment 168100 [details] document analyser (modified)
Created attachment 168410 [details] document analyser (modified) Cleaned up indentation a bit.
(In reply to wingednova from comment #12) > Created attachment 168410 [details] > document analyser (modified) > > Cleaned up indentation a bit. Hello wingednova, thanks for working on this. i think the script should be in the dev-tools repository < https://gerrit.libreoffice.org/admin/repos/dev-tools >, there is a QA folder in there. I can submit the script to the repository on your behalf if you don't want to do it yourself, by first we need the licence statement < https://wiki.documentfoundation.org/Development/gerrit/SubmitPatch#Add_yourself_to_the_contributor_list >. Could you please send it to the dev mailing list as described in the previous link ?
(In reply to Xisco Faulí from comment #13) > (In reply to wingednova from comment #12) > > Created attachment 168410 [details] > > document analyser (modified) > > > > Cleaned up indentation a bit. > > Hello wingednova, > thanks for working on this. > i think the script should be in the dev-tools repository < > https://gerrit.libreoffice.org/admin/repos/dev-tools >, there is a QA folder > in there. I can submit the script to the repository on your behalf if you > don't want to do it yourself, by first we need the licence statement < > https://wiki.documentfoundation.org/Development/gerrit/ > SubmitPatch#Add_yourself_to_the_contributor_list >. > Could you please send it to the dev mailing list as described in the > previous link ? Hello, I'm still working my way around gerrit so would be super grateful if you could submit this for me. I have sent my license statement to the mailing list. Thank you for your help!
I submitted the patch to Gerrit with Ahlaam as the author and Sebastian as co-author: https://gerrit.libreoffice.org/c/dev-tools/+/113567 Sebastian: could you send a license statement to the dev list: https://wiki.documentfoundation.org/Development/GetInvolved#License_statement
Ahlaam Rafiq committed a patch related to this issue. It has been pushed to "master": https://git.libreoffice.org/dev-tools/commit/71ffc7eba9137e94a96b72fed762cc1c9a82baeb tdf#124141 add document analyser
Now that the the UNO object inspector is included from LibreOffice 7.2 on, I'm wondering if the script is useful anymore ?
(In reply to Xisco Faulí from comment #17) > Now that the the UNO object inspector is included from LibreOffice 7.2 on, > I'm wondering if the script is useful anymore ? I don't see the inspector providing such statistics nor having an ability to print a report. Or do you think these features should be added to it?
(In reply to Buovjaga from comment #18) > (In reply to Xisco Faulí from comment #17) > > Now that the the UNO object inspector is included from LibreOffice 7.2 on, > > I'm wondering if the script is useful anymore ? > > I don't see the inspector providing such statistics nor having an ability to > print a report. Or do you think these features should be added to it? On the left, in the Object box, you can see all the elements of the document and then explore them. It also works with any kind of document, the document_analyser only works with ODF text documents
(In reply to Xisco Faulí from comment #19) > (In reply to Buovjaga from comment #18) > > (In reply to Xisco Faulí from comment #17) > > > Now that the the UNO object inspector is included from LibreOffice 7.2 on, > > > I'm wondering if the script is useful anymore ? > > > > I don't see the inspector providing such statistics nor having an ability to > > print a report. Or do you think these features should be added to it? > > On the left, in the Object box, you can see all the elements of the document > and then explore them. It also works with any kind of document, the > document_analyser only works with ODF text documents Yes, but it has no statistics to copy & paste into bug reports as was Björn's idea. I see you created bug 142373 for exporting info, but it would still need a statistics feature.
We sincerely appreciate your blog article. Upon viewing your post, numerous approaches will become apparent. Fantastic work like geometry dash lite game: https://geometrydash-lite.com