When I export ODT from 4.3.6.1 to MS Word 2003 .doc format, all bibliography entries are omitted, either when referenced by numbers or by shortnames (textkey == name of entry). When I do the same from 4.0.2.1, bibliography entries are exported as plain text (so, plain [1] or [Name], not special fields). This isn't ideal (export to "real" Word bib entries would be better), but anyhow is way, way better than the current behaviour, which seems to be a regression. Importing either .doc goes okay into either 4.0.2.1 or 4.3.6.1, so it's export-related only. Attached are: two initial ODTs, with bibliography entries displayed by 1) numkeys and 2) textkeys (prepared in 4.3.6.1) and the respective export results (informative filenames).
Created attachment 112649 [details] initial ODT, biblio numkeys
Created attachment 112650 [details] initial ODT, biblio textkeys
Created attachment 112651 [details] resulting DOC, numkeys, exported from 4.0.2.1
Created attachment 112652 [details] resulting DOC, numkeys, exported from 4.3.6.1
Created attachment 112653 [details] resulting DOC, textkeys, exported from 4.0.2.1
Created attachment 112654 [details] resulting DOC, textkeys, exported from 4.3.6.1
I confirm bug with loss of bibliography entries in all versions from 4.3.0 to recent 4.5.0 alpha under Win8.1 x64 in 4.2.7 the entries are exported as plain text. so basically it's a 4.3.x regression and the 4.2.x management was not perfect too. status NEW. I add regression keyword, bibisectRequest to whiteboard and Writer expert to CC list maybe this is somehow related to Bug 58300 - FILEOPEN: lost bibliography entries/empty bibliography index when saving as .doc or .docx
Same kind of problem as in #58300, but not the same one, precisely. Like my demos show, it had been made good at least for the time of 4.0.2.1.
the regression according to my test happened in between 4.2.7 and 4.3.0
bibisect result: git bisect log # bad: [423a84c4f7068853974887d98442bc2a2d0cc91b] source-hash-c15927f20d4727c3b8de68497b6949e72f9e6e9e # good: [752769ad0d2179e17ea0a08cc9004df7b890305b] source-hash-60c64b437c6678dd1d3fa3a6fc2b7da0480890d4 git bisect start 'last43onmaster' 'last42onmaster' # good: [4fcd68ce4979f85fda4568f4b419a4b41d07345f] source-hash-2c4621c87ed3a7b19de195c21494c9a381e72b2e git bisect good 4fcd68ce4979f85fda4568f4b419a4b41d07345f # skip: [422186458e0b4db00c7e26b54d5b631f83bcad2a] source-hash-6948bf58ce181b17f60ef81f10205ef4dac50cc6 git bisect skip 422186458e0b4db00c7e26b54d5b631f83bcad2a # bad: [a0b33bffff9c787dce71a13b344f06ae1453026b] source-hash-02e0be069e57e724c51f23e2e31b77657a6a1d3d git bisect bad a0b33bffff9c787dce71a13b344f06ae1453026b # bad: [db29eee512d03b1dc0139b3752bbe7931b165377] source-hash-77b6c1602aaa0bd059077765e7fabb53d9e6ddeb git bisect bad db29eee512d03b1dc0139b3752bbe7931b165377 # good: [0b79394752f7ecbab6ab4ecedbfab8551c6e9fbd] source-hash-381613916d42a1e18e2824b5d41028dcfe19659a git bisect good 0b79394752f7ecbab6ab4ecedbfab8551c6e9fbd # good: [4f705a8cfb1998b09f2062510b207d35a33647d8] source-hash-1eeb20f3958666ec6ba6e0fcf52e92e5eb447a14 git bisect good 4f705a8cfb1998b09f2062510b207d35a33647d8 # bad: [bbc3e332548c8e2aa5648ca68a69e713cbf21580] source-hash-fa40f7df971b1aaabccc11668a987336f50e3b0d git bisect bad bbc3e332548c8e2aa5648ca68a69e713cbf21580 # good: [4db78da3b1ecb37ce787197389fe8e061c831ad0] source-hash-077a74cfc6dbea5ee275fd11b65b523cc525e2e4 git bisect good 4db78da3b1ecb37ce787197389fe8e061c831ad0 # bad: [4e0843c411a14e3065f96f196eeb4d603664f97f] source-hash-51605bf98220d7e54dee20af17c33cebe23a0813 git bisect bad 4e0843c411a14e3065f96f196eeb4d603664f97f # first bad commit: [4e0843c411a14e3065f96f196eeb4d603664f97f] source-hash-51605bf98220d7e54dee20af17c33cebe23a0813
The respective commit is probably the following: commit 06f7d1a96eef5aa69d4872ff6d96eb5085296d09 Author: Rohit Deshmukh <rohit.deshmukh@synerzip.com> Date: Wed Mar 12 15:07:38 2014 +0530 fdo#74775: Preseved Citation after round trip. Conflicts: sw/qa/extras/ooxmlexport/ooxmlexport.cxx Reviewed on: https://gerrit.libreoffice.org/8473 Change-Id: Ie1b0ac3cb4d4b9bf305323599d5e4b63f913fb1b @Rohit: Is there any chance that you could have a look at this?
Added dataLoss to whiteboard and changed its importance and CCing Miklos as he reviewed Rohit's code in the commit.
So i did some extensive testing and this is not a filesave problem for DOC files, its a fileopen issue. Opening the exported DOC files in Word 2010 shows them correctly but LibreOffice isnt showing them. Opening the DOC files exported by 4.2.6 will open fine in master, but opening the DOC files exported by master wont open in 4.2.6 or 3.3.0. Testing exports to DOCX, numkeys reopens correctly in LibreOffice but textkeys doesnt. Opening the exports in Word 2010 has both of them open with blank entries. Version: 5.0.0.0.alpha1+ Build ID: badec7478035008f514e0976a94438fe2e32dc40 TinderBox: Linux-rpm_deb-x86@45-TDF, Branch:master, Time: 2015-04-22_00:50:58
Perhaps my report was ambigouos. I'll reiterate. I do not see any bibliography entries, numbered or shortnamed, when opening the doc in Word 2003. The doc is made by export from LO. And it isn't so that nothing is actually exported -- it's just the 'citation' format as exported seems to be changed -- per look into the binary of the resulting .DOC. I'm not talking about resulting .DOC backtrip to LO or word 2010 processing.
(In reply to Yury from comment #14) > I do not see any bibliography entries, numbered or shortnamed, when opening > the doc in Word 2003. The doc is made by export from LO. Thanks for the clarity. > And it isn't so that nothing is actually exported -- it's just the > 'citation' format as exported seems to be changed -- per look into the > binary of the resulting .DOC. Yes the format must have been changed. > I'm not talking about resulting .DOC backtrip to LO or word 2010 processing. Well whatever is being exported is a valid .doc file according to ms word 2007 and above and so its possible that new additions to the .doc format arent understood by word 2003, or word 2007+ is more forgiving to .doc errors than 2003.
(In reply to Jay Philips from comment #15) > (In reply to Yury from comment #14) ... > > And it isn't so that nothing is actually exported -- it's just the > > 'citation' format as exported seems to be changed -- per look into the > > binary of the resulting .DOC. > > Yes the format must have been changed. > > > I'm not talking about resulting .DOC backtrip to LO or word 2010 processing. > > Well whatever is being exported is a valid .doc file according to ms word > 2007 and above and so its possible that new additions to the .doc format > arent understood by word 2003, or word 2007+ is more forgiving to .doc > errors than 2003. It opens in word 2003, also, with no bibliography to show, however. So, seeing as editors in journals want papers submitted in 'word 2003' .DOC, this is a major regression, indeed, regardless of how forgiving is word 2007 or whatever. Cooperation with windows users on reports and likewise also suffers.
Currently, the export of bib. entry to word 2003 .DOC is done as the type 97 field (cf. version 4.3.7.1 sw/source/filter/ww8/ fields.cxx, fields.hxx, ww8atr.cxx). This seems to be the numerical for the so-called "table of authority", which (1) seems to be not the direct equivalent of the bib. entry and (2) seems to be not the part of the standard Word 2003 distro. That much I infer from the internet question/answer resources, where it is claimed there is no in-built capability for citations/bibliography in Word 2003 proper. Now, there are two significant open-access references on DOC binary format: [1] https://msdn.microsoft.com/en-us/library/office/cc313105%28v=office.12%29.aspx https://msdn.microsoft.com/en-us/library/office/cc313153%28v=office.14%29.aspx [2] http://www.digitalpreservation.gov/formats/digformatspecs/Word97-2007BinaryFileFormat(doc)Specification.pdf [3] http://www.ecma-international.org/publications/standards/Ecma-376.htm Per [2, p.138], treating the word97-word2007 formats, in word97 (sic!) format there is no field with code 97 "CITATION" which is used in . The [2] does not have searchable strings "citat/biblio", and SEEMS to have nothing appropriate in "new in word 2003" section. Per [1] (both links lead eventually to the ~19M PDF "[MS-DOC] — v20141018 // Word (.doc) Binary File Format // Release: October 30, 2014", which, in turn, references ECMA-376 [3]), there is no type 97 field, too [1, p.357]. In the light of this, I consider the whole exercise of exporting bib entries to word 2003 .DOC as the type 97 fields to be dubious, possibly even erroneous. I think the bibliography entries' export should be switched back w/r to the exporting to Word 2003 .DOC to the bare text format.
I'm correcting the summary, as it's clearly not even the Word 2003 fileopen, but the LO filesave in a too specific? word addon-related? manner.
Created attachment 115386 [details] patch for 4.3.7.1 enabling the bibliography entries export This patch is against the 4.3.7.1 source, directory sw/source/filter/ww8 It enables the word2003 .DOC export of bibliography entries as the 'Author' field with OOO biblios 'short names' as the parameters. The result of export of the bibliography itself is empty (heading only). The build round-trip tests would fail, of course, ignore it (make -k). This ALMOST takes care of the issue in my case. Hopefully, somebody will advise on where's the code for exporting the bibliography itself.
Created attachment 115444 [details] patch for 4.3.7.1 enabling the bibliography entries AND index export This patch is against the 4.3.7.1 source and enables the export to word2003 (!) .DOC of both the bibliography entries AND the resulting bibliography list. Implemented with the (more correct) QUOTE field type. Round-trip build test fails (predictably), may ignore, the resulting binary works fine. This works for me adequately, both real word2003 and softmaker freeoffice open my exported biblios okay. Are proofdocs needed?
it does indeed appear that enum values > 95 do not actually exist in binary DOC format. suspect that your patch will fix the WW8 export but break the DOCX export. there are multiple implementations of OutputField, for RTF / WW8 / DOCX; perhaps best to "map" the non-existent enum values in the WW8 OutputField impl, then other formats are unaffected. one thing that is perhaps worth trying is, export the fields to DOCX, import that in Word, export it to DOC and see what kind of field that writes (there are some binary format tools in a separate git repo https://gerrit.libreoffice.org/#/admin/projects/mso-dumper but i'm not at all familiar with these).
(In reply to Michael Stahl from comment #21) > it does indeed appear that enum values > 95 do not actually exist in binary > DOC format. > > suspect that your patch will fix the WW8 export but break the DOCX export. More like, it fixes the .DOC export if the receiver is real word2003, but breaks it, if the receiver is newer word or something like that. The number 97 had to come from somewhere, right? However, anyway, there is no field 97 in format specification, so the source definitely has to be changed in that matter. Possibly, make LO to distinguish between export to pure word2003 doc or to extended format (like, word2003++)? > one thing that is perhaps worth trying is, export the fields to DOCX, > import that in Word, export it to DOC and see what kind of field that writes Would it even work with real word2003?
Michael Stahl committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=484759aadd964b011e8e649ba021d09f40a79440 sw: add a comment about WW8 non-existent fields, related: tdf#88697 It will be available in 5.0.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
(In reply to Yury from comment #22) > (In reply to Michael Stahl from comment #21) > > it does indeed appear that enum values > 95 do not actually exist in binary > > DOC format. > > > > suspect that your patch will fix the WW8 export but break the DOCX export. > > More like, it fixes the .DOC export if the receiver is real word2003, but > breaks it, if the receiver is newer word or something like that. The number > 97 had to come from somewhere, right? i believe the numbers 96 and 97 were made up by the author of the patch in comment #11 without any actual knowledge of WW8 format, purely for use in DOCX where the actual *numbers* are not written into the file. > > one thing that is perhaps worth trying is, export the fields to DOCX, > > import that in Word, export it to DOC and see what kind of field that writes > > Would it even work with real word2003? probably not, that can't import DOCX, need a later version.
* there was a related bug 76279 that was fixed in LO 4.4 but sadly not backported to 4.3 branch before EOL * Word 2010 can read the fields after storing the attached files with LO 4.4.4 * LO cannot read the fields written by LO 4.4.4 * with the patch below LO can read the fields * don't have Word 2003 so cannot tell if the patch helps in that case; this needs testing please try out this patch if it helps with Word 2003 interop (should apply to libreoffice-4-4 or later branch with copy/pasting the "cherry-pick" command from gerrit page) https://gerrit.libreoffice.org/16285
Created attachment 116545 [details] the 2nd attachment, exported with patch from previous comment
Thank you very much, indeed, Michael! Personally, having my local build with said hack, I'm not in a specal hurry to have that in a release, but anyway, nice to see someone cares. :) I have some remarks/questions on the product of the patch: 1) In word2003 you can get either fields or the fields' codes displayed. Mine patch produces DOC in which fields codes (as shown by word2003) are starting with `{QUOTE "HIll 'NT"`. Your patch: just `{Hill 'NT}` in the field. Why so, and is it correct? 2) Could you try your patch with a file I'm attaching in a minute? I'd like to know how the bibliography comes through.
Created attachment 116552 [details] ODT with both biblio entry and bibliography index, text keys ('short names')
Created attachment 116653 [details] previous attachment exported with last patch
Created attachment 116654 [details] previous attachment exported with final patch
in WW8 format there is both a integer field type and a textual field type in the field instruction; my first patch changes only the integer field type. ... and Word 2010 does not recognize the type of the field, if "Edit Field" dialog unselected listbox entry indicates that. in RTF/DOCX there is only the field instruction, which means that the missing QUOTE is probably a bug for those formats. <w:instrText>Hill 'NT</w:instrText> so it's probably best to keep the integer field type and textual field type consistent, and add the textual field type for all formats. should be FIXED on master
Michael Stahl committed a patch related to this issue. It has been pushed to "master": http://cgit.freedesktop.org/libreoffice/core/commit/?id=f56289ac6d7f3da7fd45dd431ce4c540aadcad56 tdf#88697: sw: make WW8 export of CITATION fields compatible with It will be available in 5.1.0. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Michael Stahl committed a patch related to this issue. It has been pushed to "libreoffice-5-0": http://cgit.freedesktop.org/libreoffice/core/commit/?id=1bcd54e93245dfaea0a072493d8bab9e569bae93&h=libreoffice-5-0 tdf#88697: sw: make WW8 export of CITATION fields compatible with It will be available in 5.0.0.1. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
backport for 4.4 in gerrit: https://gerrit.libreoffice.org/16410
Thank a lot, Michael, for your involvement! Now we are only a correct preview handling away from the "ideal" word exporter for scientists. :))
Michael Stahl committed a patch related to this issue. It has been pushed to "libreoffice-4-4": http://cgit.freedesktop.org/libreoffice/core/commit/?id=a99a8aa1f6ec95a578a34f92aeab3523f090176b&h=libreoffice-4-4 tdf#88697: sw: make WW8 export of CITATION fields compatible with It will be available in 4.4.5. The patch should be included in the daily builds available at http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More information about daily builds can be found at: http://wiki.documentfoundation.org/Testing_Daily_Builds Affected users are encouraged to test the fix and report feedback.
Migrating Whiteboard tags to Keywords: (bibisected, dataLoss) [NinjaEdit]