Created attachment 68361 [details] the example file from the debian bug Reported in http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=690066: --- snip --- Package: libreoffice-writer Version: 1:3.5.4+dfsg-2 Severity: grave Justification: causes non-serious data loss The data loss scenario is as follows: 1) Open the attached docx file, 2) Edit it, save as docx. The file is now un-openable by MS Office, and only the first 3 pages are visible in libreoffice. The corruption actually does not rely on editing the file; this can be confirmed using "save-as" to a second docx file; the same apparent truncation happens. Unzipping the truncated file, it looks like the user data (i.e. text of paragraphs) is actually still there, but according to xmllint word/document.xml does not parse. word/document.xml:2: parser error : Opening and ending tag mismatch: hyperlink line 2 and p ="18"/><w:szCs w:val="20"/></w:rPr><w:t xml:space="preserve"> </w:t></w:r></w:p> ^ word/document.xml:2: parser error : Opening and ending tag mismatch: p line 2 and body docGrid w:charSpace="0" w:linePitch="360" w:type="default"/></w:sectPr></w:body> ^ word/document.xml:2: parser error : Opening and ending tag mismatch: body line 2 and document rSpace="0" w:linePitch="360" w:type="default"/></w:sectPr></w:body></w:document> ^ word/document.xml:2: parser error : Premature end of data in tag document line 2 rSpace="0" w:linePitch="360" w:type="default"/></w:sectPr></w:body></w:document> I suppose it might in principle be possible to recover the data from the corrupted XML file. That seems daunting enough that it still seems to be an RC bug to me. FWIW, I get this message in the terminal where I started lowriter /tmp/buildd/libreoffice-3.5.4+dfsg/writerfilter/source/dmapper/GraphicImport.cxx:1486 failed. Message :GraphicCrop -- System Information: Debian Release: wheezy/sid APT prefers testing APT policy: (900, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-3-amd64 (SMP w/8 CPU cores) Locale: LANG=en_CA.UTF-8, LC_CTYPE=en_CA.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages libreoffice-writer depends on: ii libc6 2.13-35 ii libgcc1 1:4.7.1-7 ii libicu48 4.8.1.1-9 ii libreoffice-base-core 1:3.5.4+dfsg-2 ii libreoffice-core 1:3.5.4+dfsg-2 ii libstdc++6 4.7.1-7 ii libwpd-0.9-9 0.9.4-3 ii libwpg-0.2-2 0.2.1-1 ii libwps-0.2-2 0.2.7-1 ii libxml2 2.8.0+dfsg1-5 ii uno-libs3 3.5.4+dfsg-2 ii ure 3.5.4+dfsg-2 ii zlib1g 1:1.2.7.dfsg-13 Versions of packages libreoffice-writer recommends: ii default-jre [java5-runtime] 1:1.6-47 ii libreoffice-emailmerge 1:3.5.4+dfsg-2 ii libreoffice-filter-binfilter 1:3.5.4+dfsg-2 ii libreoffice-java-common 1:3.5.4+dfsg-2 ii libreoffice-math 1:3.5.4+dfsg-2 ii openjdk-6-jre [java5-runtime] 6b24-1.11.4-3 Versions of packages libreoffice-writer suggests: pn libreoffice-base <none> pn libreoffice-gcj <none> Versions of packages libreoffice-core depends on: ii fontconfig 2.9.0-7 ii fonts-opensymbol 2:102.2+LibO3.5.4+dfsg-2 ii libc6 2.13-35 ii libcairo2 1.12.2-2 ii libcmis-0.2-0 0.1.0-1+b1 ii libcurl3-gnutls 7.26.0-1 ii libdb5.1 5.1.29-5 ii libexpat1 2.1.0-1 ii libexttextcat0 3.2.0-2 ii libfontconfig1 2.9.0-7 ii libfreetype6 2.4.9-1 ii libgcc1 1:4.7.1-7 ii libglib2.0-0 2.32.3-1 ii libgraphite2-2.0.0 1.1.3-1 ii libgstreamer-plugins-base0.10-0 0.10.36-1 ii libgstreamer0.10-0 0.10.36-1 ii libhunspell-1.3-0 1.3.2-4 ii libhyphen0 2.8.3-2 ii libice6 2:1.0.8-2 ii libicu48 4.8.1.1-9 ii libjpeg8 8d-1 ii libmythes-1.2-0 2:1.2.2-1 ii libneon27-gnutls 0.29.6-3 ii libnspr4 2:4.9.2-1 ii libnspr4-0d 2:4.9.2-1 ii libnss3 2:3.13.6-1 ii libnss3-1d 2:3.13.6-1 ii libpng12-0 1.2.49-1 ii librdf0 1.0.15-1+b1 ii libreoffice-common 1:3.5.4+dfsg-2 ii librsvg2-2 2.36.1-1 ii libsm6 2:1.2.1-2 ii libssl1.0.0 1.0.1c-4 ii libstdc++6 4.7.1-7 ii libx11-6 2:1.5.0-1 ii libxext6 2:1.3.1-2 ii libxinerama1 2:1.1.2-1 ii libxml2 2.8.0+dfsg1-5 ii libxrandr2 2:1.3.2-2 ii libxrender1 1:0.9.7-1 ii libxslt1.1 1.1.26-14 ii uno-libs3 3.5.4+dfsg-2 ii ure 3.5.4+dfsg-2 ii zlib1g 1:1.2.7.dfsg-13 --- snip --- I can reproduce this with master as of 20120927
Created attachment 68362 [details] file created with save-as (from bar.docx)
confirm myself (I see it)
Comment on attachment 68361 [details] the example file from the debian bug Fixed MIME type.
Comment on attachment 68362 [details] file created with save-as (from bar.docx) Fixed MIME type.
The bug disappears if the <w:hyperlink r:id="rId15" w:history="1"/> markup is removed from word/document.xml.
Created attachment 70843 [details] Minimal word/ subdirectory triggering the bug. Please find a minimal test case, with only a few suspect lines. Everything outside the word/ subdirectory is identical to what lowriter produces for a freshly created empty document. You may reproduce quickly with # unoconv --format=docx --output=converted.docx bar.docx # unzip -p converted.docx word/document.xml | xmllint --noout -
Converting to odt instead of docx does not trigger the bug. In converted.docx/word/document.xml, the problem is caused by </w:hyperlink><w:hyperlink r:id="rId2"> The docx output filter writes these items in the wrong order. sw/source/filter/ww8/docxattributeoutput.hxx declares two private booleans: // close of hyperlink needed bool m_closeHyperlinkInThisRun; bool m_closeHyperlinkInPreviousRun; The body uses them to store persistent information across DOM callbacks. Initialization sets them to FF EndURL() sets the former to V VF -> EndRun() -> FF (serialize an end element late) VF -> RunText() -> FV FV -> EndRun() -> FF (serialize an end element quick) I fail to understand the detail right now, but I strongly guess that: - since the hyperlink contains no text, RunText() is never called. - m_closeHyperLinkInPreviousRun never replaces m_closeHyperLinkInThisRun. - in EndRun(), serialization of the end element occurs too quick. Good night.
Last post was based on 3.5.4 sources. The bug is reproducible it with 3.6.4. At least two similar bugs (Bug 52610 and Bug 53175) have been patched meanwhile.
The attached patch sligthly improves readability. This is useful for code that already caused at least 3 bugs. If I understand well, for each element: - EndURL() is called at most once, - then RunText() is called an arbitrary number of times, - then EndRun() is called exactly once. If so, the variable m_startedHyperlink may be read before it is set for the current element. One solution would be to make it local to EndRun(). It is possible that the bug is caused by this sequence. - EndURL then RunText then Endrun are called for Element 1 assuming m_pHyperlinkAttrList, m_startedHyperlink is set when we exit - EndURL then RunText then EndRun are called for Element 2 closing of Element 2 happens to quick.
Created attachment 72963 [details] clarification of hyperlink closing steps
Bug 47669 is also related.
there's a patch here, can somebody who knows the WW8 export review this? i can reproduce the bug on current master. Nicolas, could you please send a mail with a text like http://permalink.gmane.org/gmane.comp.documentfoundation.libreoffice.devel/38402 to the mailing list libreoffice@lists.freedesktop.org ? also in the future it's best to send a patch either to mailing list or to gerrit (http://wiki.documentfoundation.org/Development/gerrit) because developers look there far more often than at bugzilla for patches.
Be warned that the diff simplifies the faulty code, but does not solve the problem. I only posted each progress to help next bug squasher. A true patch for these bugs would need a clear specification of the order in which callbacks are called, and I did not find time for that yet.
I can also confirm with the current release: Version 4.0.3.3
I'm able to reproduce this by creating a new document, typing http://example.com blah, enter, blah, saving as .docx, closing libreoffice and then opening that file in libreoffice again, and the second blah has disappeared. "Version: 4.1.4.2", "Build ID: Gentoo official package"
Tried Benjamin's steps but NoRepro:4.2.0.1:Ubuntu13.10 the document shows fine. Since I'm not sure if that means the bug is fixed in 4.2.0.1 I ask for more tests with 4.2.0.1. Anybody?
Dear Bug Submitter, This bug has been in NEEDINFO status with no change for at least 6 months. Please provide the requested information as soon as possible and mark the bug as UNCONFIRMED. Due to regular bug tracker maintenance, if the bug is still in NEEDINFO status with no change in 30 days the QA team will close the bug as INVALID due to lack of needed information. For more information about our NEEDINFO policy please read the wiki located here: https://wiki.documentfoundation.org/QA/FDO/NEEDINFO If you have already provided the requested information, please mark the bug as UNCONFIRMED so that the QA team knows that the bug is ready to be confirmed. Thank you for helping us make LibreOffice even better for everyone! Warm Regards, QA Team
I checked every test described in this bug log with 4.2.5.2 on Debian, and all was OK. It seems that this bug is fixed. Congratulations.
please note the "version" should be *earliest* one that *has* the bug :) resolving WFM per comment #18