Created attachment 115149 [details] Windows Explorer screenshot of metadata 2007 and LO DOCX file DOCX files saved using LO are missing standard metadata. (Pages, Word / Char / Line / Para counts) Screenshot attached of Windows Explorer comparing files created in Word 2007 and LibreOffice 4.4.2.2, also seeing same results using Apache Tika 1.8 to pull the metadata.
I can confirm with LO 4.4.2, win7
it should be quite easy to add this, marking as easy-hack. the document properties are exported to OOXML here: oox/source/core/xmlfilterbase.cxx:XmlFilterBase& XmlFilterBase::exportDocumentProperties( Reference< XDocumentProperties > xProperties ) all that is missing is getting the XDocumentProperties::getDocumentStatistics() and converting that to XML elements or attributes. in ECMA-376 3rd edition the definition of the "Extended File Properties" elements starts on page 4254, "22.2.2.1 Application" up to "22.2.2.28 Words (Word Count)". http://www.ecma-international.org/publications/standards/Ecma-376.htm
Thank you for your efforts on this guys :) The fast response is a pleasant surprise. If I can be of any further help, please don't hesitate to let me know.
I'd assume this is a duplicate of bug 89775.
Hi, guys. See no stat counter for the lines of text. neither here: sw/source/filter/xml/xmlmeta.cxx: statistic s_stats [] nor here: sw/inc/docstat.hxx: SW_DLLPUBLIC SwDocStat Quick and easy patch for Pages / Word count / Character count could be quickly submitted (from me) (Paragraph count is already exposed in MS Properties Explorer)
alexey.chemichev committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=8beea0f6b43b9fe893418687a75d28a6d624ede7 tdf#90904 DOCX export metadata for "Pages", "Word count", "Character count" It will be available in 5.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
alexey.chemichev committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=24346dc6630471da65a2c19d767cb9deed73405a tdf#90904 Sorry, mixed Characters and CharactersWithSpaces at a first time It will be available in 5.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Trying to define the scope (see also tdf#89775)... ECMA describes 28 Extended Properties: + 01. Application (Application Name) 02. AppVersion (Application Version) + 03. Characters (Total Number of Characters) + 04. CharactersWithSpaces (Number of Characters (With Spaces)) 05. Company (Name of Company) 06. DigSig (Digital Signature) 07. DocSecurity (Document Security) 08. HeadingPairs (Heading Pairs) 09. HiddenSlides (Number of Hidden Slides) 10. HLinks (Hyperlink List) 11. HyperlinkBase (Relative Hyperlink Base) 12. HyperlinksChanged (Hyperlinks Changed) 13. Lines (Number of Lines) 14. LinksUpToDate (Links Up-to-Date) 15. Manager (Name of Manager) 16. MMClips (Total Number of Multimedia Clips) 17. Notes (Number of Slides Containing Notes) + 18. Pages (Total Number of Pages) + 19. Paragraphs (Total Number of Paragraphs) 20. PresentationFormat (Intended Format of Presentation) 21. Properties (Application Specific File Properties) 22. ScaleCrop (Thumbnail Display Mode) 23. SharedDoc (Shared Document) 24. Slides (Slides Metadata Element) + 25. Template (Name of Document Template) 26. TitlesOfParts (Part Titles) + 27. TotalTime (Total Edit Time Metadata Element) + 28. Words (Word Count) Someone please help to mark the props that are really valid for LO and are present (or can be calculated) in the codebase
Migrating Whiteboard tags to Keywords: (easyHack difficultyBeginner skillCPP filter:ooxml) [NinjaEdit]
JanI is default CC for Easy Hacks (Add Jan; remove LibreOffice Dev List from CC) [NinjaEdit]
oops, missed that bugzilla mail... "AppVersion" would sound obvious but iirc i tried to add that once and found that it really is "Microsoft Office version" - if the version number isn't formatted exactly like MSO version numbers are then MSO will complain that the document is invalid. (also i'm surprised that the "HLinks" anachronism still exists) one would think that Impress would have a SlideCount statistic but apparently it doesn't. so i think we're done here for now, nothing easily implemented left, thanks Alexey.