Bug 88697 - FILESAVE: export to word 2003 .doc format formats bibliography entries not appropriately
Summary: FILESAVE: export to word 2003 .doc format formats bibliography entries not ap...
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.3.0.4 release
Hardware: All All
: high major
Assignee: Michael Stahl (allotropia)
URL:
Whiteboard: target:5.1.0 target:5.0.0.1 target:4.4.5
Keywords: bibisected, bisected, dataLoss, regression
Depends on:
Blocks:
 
Reported: 2015-01-22 09:21 UTC by Yury
Modified: 2016-10-25 19:20 UTC (History)
6 users (show)

See Also:
Crash report or crash signature:


Attachments
initial ODT, biblio numkeys (11.92 KB, application/vnd.oasis.opendocument.text)
2015-01-22 09:22 UTC, Yury
Details
initial ODT, biblio textkeys (11.41 KB, application/vnd.oasis.opendocument.text)
2015-01-22 09:23 UTC, Yury
Details
resulting DOC, numkeys, exported from 4.0.2.1 (11.00 KB, application/msword)
2015-01-22 09:24 UTC, Yury
Details
resulting DOC, numkeys, exported from 4.3.6.1 (11.00 KB, application/msword)
2015-01-22 09:24 UTC, Yury
Details
resulting DOC, textkeys, exported from 4.0.2.1 (11.00 KB, application/msword)
2015-01-22 09:25 UTC, Yury
Details
resulting DOC, textkeys, exported from 4.3.6.1 (11.00 KB, application/msword)
2015-01-22 09:25 UTC, Yury
Details
patch for 4.3.7.1 enabling the bibliography entries export (680 bytes, text/plain)
2015-05-06 15:13 UTC, Yury
Details
patch for 4.3.7.1 enabling the bibliography entries AND index export (969 bytes, text/plain)
2015-05-08 07:05 UTC, Yury
Details
the 2nd attachment, exported with patch from previous comment (11.00 KB, application/msword)
2015-06-15 08:45 UTC, Michael Stahl (allotropia)
Details
ODT with both biblio entry and bibliography index, text keys ('short names') (12.84 KB, application/vnd.oasis.opendocument.text)
2015-06-15 10:18 UTC, Yury
Details
previous attachment exported with last patch (11.00 KB, application/msword)
2015-06-19 14:22 UTC, Michael Stahl (allotropia)
Details
previous attachment exported with final patch (11.00 KB, application/msword)
2015-06-19 14:23 UTC, Michael Stahl (allotropia)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yury 2015-01-22 09:21:55 UTC
When I export ODT from 4.3.6.1 to MS Word 2003 .doc format, all bibliography entries are omitted, either when referenced by numbers or by shortnames (textkey == name of entry).

When I do the same from 4.0.2.1, bibliography entries are exported as plain text (so, plain [1] or [Name], not special fields). This isn't ideal (export to "real" Word bib entries would be better), but anyhow is way, way better than the current behaviour, which seems to be a regression.

Importing either .doc goes okay into either 4.0.2.1 or 4.3.6.1, so it's export-related only.

Attached are: two initial ODTs, with bibliography entries displayed by 1) numkeys and 2) textkeys (prepared in 4.3.6.1) and the respective export results (informative filenames).
Comment 1 Yury 2015-01-22 09:22:41 UTC
Created attachment 112649 [details]
initial ODT, biblio numkeys
Comment 2 Yury 2015-01-22 09:23:13 UTC
Created attachment 112650 [details]
initial ODT, biblio textkeys
Comment 3 Yury 2015-01-22 09:24:10 UTC
Created attachment 112651 [details]
resulting DOC, numkeys, exported from 4.0.2.1
Comment 4 Yury 2015-01-22 09:24:39 UTC
Created attachment 112652 [details]
resulting DOC, numkeys, exported from 4.3.6.1
Comment 5 Yury 2015-01-22 09:25:14 UTC
Created attachment 112653 [details]
resulting DOC, textkeys, exported from 4.0.2.1
Comment 6 Yury 2015-01-22 09:25:52 UTC
Created attachment 112654 [details]
resulting DOC, textkeys, exported from 4.3.6.1
Comment 7 tommy27 2015-01-23 04:40:30 UTC
I confirm bug with loss of bibliography entries in all versions from 4.3.0 to recent 4.5.0 alpha under Win8.1 x64

in 4.2.7 the entries are exported as plain text.

so basically it's a 4.3.x regression and the 4.2.x management was not perfect too.

status NEW.
I add regression keyword, bibisectRequest to whiteboard and Writer expert to CC list

maybe this is somehow related to Bug 58300 - FILEOPEN: lost bibliography entries/empty bibliography index when saving as .doc or .docx
Comment 8 Yury 2015-01-23 05:38:11 UTC
Same kind of problem as in #58300, but not the same one, precisely. Like my demos show, it had been made good at least for the time of 4.0.2.1.
Comment 9 tommy27 2015-01-23 06:42:15 UTC
the regression according to my test happened in between 4.2.7 and 4.3.0
Comment 10 Michael Weghorn 2015-01-30 23:50:10 UTC
bibisect result:

git bisect log
# bad: [423a84c4f7068853974887d98442bc2a2d0cc91b] source-hash-c15927f20d4727c3b8de68497b6949e72f9e6e9e
# good: [752769ad0d2179e17ea0a08cc9004df7b890305b] source-hash-60c64b437c6678dd1d3fa3a6fc2b7da0480890d4
git bisect start 'last43onmaster' 'last42onmaster'
# good: [4fcd68ce4979f85fda4568f4b419a4b41d07345f] source-hash-2c4621c87ed3a7b19de195c21494c9a381e72b2e
git bisect good 4fcd68ce4979f85fda4568f4b419a4b41d07345f
# skip: [422186458e0b4db00c7e26b54d5b631f83bcad2a] source-hash-6948bf58ce181b17f60ef81f10205ef4dac50cc6
git bisect skip 422186458e0b4db00c7e26b54d5b631f83bcad2a
# bad: [a0b33bffff9c787dce71a13b344f06ae1453026b] source-hash-02e0be069e57e724c51f23e2e31b77657a6a1d3d
git bisect bad a0b33bffff9c787dce71a13b344f06ae1453026b
# bad: [db29eee512d03b1dc0139b3752bbe7931b165377] source-hash-77b6c1602aaa0bd059077765e7fabb53d9e6ddeb
git bisect bad db29eee512d03b1dc0139b3752bbe7931b165377
# good: [0b79394752f7ecbab6ab4ecedbfab8551c6e9fbd] source-hash-381613916d42a1e18e2824b5d41028dcfe19659a
git bisect good 0b79394752f7ecbab6ab4ecedbfab8551c6e9fbd
# good: [4f705a8cfb1998b09f2062510b207d35a33647d8] source-hash-1eeb20f3958666ec6ba6e0fcf52e92e5eb447a14
git bisect good 4f705a8cfb1998b09f2062510b207d35a33647d8
# bad: [bbc3e332548c8e2aa5648ca68a69e713cbf21580] source-hash-fa40f7df971b1aaabccc11668a987336f50e3b0d
git bisect bad bbc3e332548c8e2aa5648ca68a69e713cbf21580
# good: [4db78da3b1ecb37ce787197389fe8e061c831ad0] source-hash-077a74cfc6dbea5ee275fd11b65b523cc525e2e4
git bisect good 4db78da3b1ecb37ce787197389fe8e061c831ad0
# bad: [4e0843c411a14e3065f96f196eeb4d603664f97f] source-hash-51605bf98220d7e54dee20af17c33cebe23a0813
git bisect bad 4e0843c411a14e3065f96f196eeb4d603664f97f
# first bad commit: [4e0843c411a14e3065f96f196eeb4d603664f97f] source-hash-51605bf98220d7e54dee20af17c33cebe23a0813
Comment 11 Michael Weghorn 2015-01-30 23:57:35 UTC
The respective commit is probably the following:

commit 06f7d1a96eef5aa69d4872ff6d96eb5085296d09
Author: Rohit Deshmukh <rohit.deshmukh@synerzip.com>
Date:   Wed Mar 12 15:07:38 2014 +0530

    fdo#74775: Preseved Citation after round trip.
    
    Conflicts:
        sw/qa/extras/ooxmlexport/ooxmlexport.cxx
    Reviewed on:
        https://gerrit.libreoffice.org/8473
    
    Change-Id: Ie1b0ac3cb4d4b9bf305323599d5e4b63f913fb1b


@Rohit: Is there any chance that you could have a look at this?
Comment 12 Yousuf Philips (jay) (retired) 2015-04-26 22:44:42 UTC
Added dataLoss to whiteboard and changed its importance and CCing Miklos as he reviewed Rohit's code in the commit.
Comment 13 Yousuf Philips (jay) (retired) 2015-04-26 23:35:37 UTC
So i did some extensive testing and this is not a filesave problem for DOC files, its a fileopen issue. Opening the exported DOC files in Word 2010 shows them correctly but LibreOffice isnt showing them.

Opening the DOC files exported by 4.2.6 will open fine in master, but opening the DOC files exported by master wont open in 4.2.6 or 3.3.0.

Testing exports to DOCX, numkeys reopens correctly in LibreOffice but textkeys doesnt. Opening the exports in Word 2010 has both of them open with blank entries.

Version: 5.0.0.0.alpha1+
Build ID: badec7478035008f514e0976a94438fe2e32dc40
TinderBox: Linux-rpm_deb-x86@45-TDF, Branch:master, Time: 2015-04-22_00:50:58
Comment 14 Yury 2015-04-27 05:49:44 UTC
Perhaps my report was ambigouos. I'll reiterate.

I do not see any bibliography entries, numbered or shortnamed, when opening the doc in Word 2003. The doc is made by export from LO.

And it isn't so that nothing is actually exported -- it's just the 'citation' format as exported seems to be changed -- per look into the binary of the resulting .DOC.

I'm not talking about resulting .DOC backtrip to LO or word 2010 processing.
Comment 15 Yousuf Philips (jay) (retired) 2015-04-27 07:55:09 UTC
(In reply to Yury from comment #14)
> I do not see any bibliography entries, numbered or shortnamed, when opening
> the doc in Word 2003. The doc is made by export from LO.

Thanks for the clarity.

> And it isn't so that nothing is actually exported -- it's just the
> 'citation' format as exported seems to be changed -- per look into the
> binary of the resulting .DOC.

Yes the format must have been changed.

> I'm not talking about resulting .DOC backtrip to LO or word 2010 processing.

Well whatever is being exported is a valid .doc file according to ms word 2007 and above and so its possible that new additions to the .doc format arent understood by word 2003, or word 2007+ is more forgiving to .doc errors than 2003.
Comment 16 Yury 2015-04-27 08:19:43 UTC
(In reply to Jay Philips from comment #15)
> (In reply to Yury from comment #14)
...
> > And it isn't so that nothing is actually exported -- it's just the
> > 'citation' format as exported seems to be changed -- per look into the
> > binary of the resulting .DOC.
> 
> Yes the format must have been changed.
> 
> > I'm not talking about resulting .DOC backtrip to LO or word 2010 processing.
> 
> Well whatever is being exported is a valid .doc file according to ms word
> 2007 and above and so its possible that new additions to the .doc format
> arent understood by word 2003, or word 2007+ is more forgiving to .doc
> errors than 2003.

It opens in word 2003, also, with no bibliography to show, however.

So, seeing as editors in journals want papers submitted in 'word 2003' .DOC, this is a major regression, indeed, regardless of how forgiving is word 2007 or whatever.

Cooperation with windows users on reports and likewise also suffers.
Comment 17 Yury 2015-04-27 16:20:17 UTC
Currently, the export of bib. entry to word 2003 .DOC is done as the type 97 field (cf. version 4.3.7.1 sw/source/filter/ww8/ fields.cxx, fields.hxx, ww8atr.cxx). This seems to be the numerical for the so-called "table of authority", which (1) seems to be not the direct equivalent of the bib. entry and (2) seems to be not the part of the standard Word 2003 distro.

That much I infer from the internet question/answer resources, where it is claimed there is no in-built capability for citations/bibliography in Word 2003 proper.

Now, there are two significant open-access references on DOC binary format: 

[1] https://msdn.microsoft.com/en-us/library/office/cc313105%28v=office.12%29.aspx 
https://msdn.microsoft.com/en-us/library/office/cc313153%28v=office.14%29.aspx
[2] http://www.digitalpreservation.gov/formats/digformatspecs/Word97-2007BinaryFileFormat(doc)Specification.pdf
[3] http://www.ecma-international.org/publications/standards/Ecma-376.htm

Per [2, p.138], treating the word97-word2007 formats, in word97 (sic!) format there is no field with code 97 "CITATION" which is used in . The [2] does not have searchable strings "citat/biblio", and SEEMS to have nothing appropriate in "new in word 2003" section.

Per [1] (both links lead eventually to the ~19M PDF "[MS-DOC] — v20141018 // Word (.doc) Binary File Format // Release: October 30, 2014", which, in turn, references ECMA-376 [3]), there is no type 97 field, too [1, p.357]. 

In the light of this, I consider the whole exercise of exporting bib entries to word 2003 .DOC as the type 97 fields to be dubious, possibly even erroneous.

I think the bibliography entries' export should be switched back w/r to the exporting to Word 2003 .DOC to the bare text format.
Comment 18 Yury 2015-04-28 06:09:13 UTC
I'm correcting the summary, as it's clearly not even the Word 2003 fileopen, but the LO filesave in a too specific? word addon-related? manner.
Comment 19 Yury 2015-05-06 15:13:05 UTC
Created attachment 115386 [details]
patch for 4.3.7.1 enabling the bibliography entries export

This patch is against the 4.3.7.1 source, directory sw/source/filter/ww8
It enables the word2003 .DOC export of bibliography entries as the 'Author' field with OOO biblios 'short names' as the parameters. The result of export of the bibliography itself is empty (heading only).

The build round-trip tests would fail, of course, ignore it (make -k).

This ALMOST takes care of the issue in my case.

Hopefully, somebody will advise on where's the code for exporting the bibliography itself.
Comment 20 Yury 2015-05-08 07:05:07 UTC
Created attachment 115444 [details]
patch for 4.3.7.1 enabling the bibliography entries AND index export

This patch is against the 4.3.7.1 source and enables the export to word2003 (!) .DOC of both the bibliography entries AND the resulting bibliography list. Implemented with the (more correct) QUOTE field type. Round-trip build test fails (predictably), may ignore, the resulting binary works fine.

This works for me adequately, both real word2003 and softmaker freeoffice open my exported biblios okay. Are proofdocs needed?
Comment 21 Michael Stahl (allotropia) 2015-05-20 16:04:59 UTC
it does indeed appear that enum values > 95 do not actually exist in binary DOC format.

suspect that your patch will fix the WW8 export but break the DOCX export.

there are multiple implementations of OutputField, for RTF / WW8 / DOCX;
perhaps best to "map" the non-existent enum values in the WW8
OutputField impl, then other formats are unaffected.

one thing that is perhaps worth trying is, export the fields to DOCX,
import that in Word, export it to DOC and see what kind of field that writes
(there are some binary format tools in a separate git repo
https://gerrit.libreoffice.org/#/admin/projects/mso-dumper
but i'm not at all familiar with these).
Comment 22 Yury 2015-05-20 17:37:20 UTC
(In reply to Michael Stahl from comment #21)
> it does indeed appear that enum values > 95 do not actually exist in binary
> DOC format.
> 
> suspect that your patch will fix the WW8 export but break the DOCX export.

More like, it fixes the .DOC export if the receiver is real word2003, but breaks it, if the receiver is newer word or something like that. The number 97 had to come from somewhere, right? However, anyway, there is no field 97 in format specification, so the source definitely has to be changed in that matter.

Possibly, make LO to distinguish between export to pure word2003 doc or to extended format (like, word2003++)?

> one thing that is perhaps worth trying is, export the fields to DOCX,
> import that in Word, export it to DOC and see what kind of field that writes

Would it even work with real word2003?
Comment 23 Commit Notification 2015-05-20 19:24:29 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=484759aadd964b011e8e649ba021d09f40a79440

sw: add a comment about WW8 non-existent fields, related: tdf#88697

It will be available in 5.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 24 Michael Stahl (allotropia) 2015-05-20 20:14:27 UTC
(In reply to Yury from comment #22)
> (In reply to Michael Stahl from comment #21)
> > it does indeed appear that enum values > 95 do not actually exist in binary
> > DOC format.
> > 
> > suspect that your patch will fix the WW8 export but break the DOCX export.
> 
> More like, it fixes the .DOC export if the receiver is real word2003, but
> breaks it, if the receiver is newer word or something like that. The number
> 97 had to come from somewhere, right?

i believe the numbers 96 and 97 were made up by the author of the patch
in comment #11 without any actual knowledge of WW8 format, purely for
use in DOCX where the actual *numbers* are not written into the file.

> > one thing that is perhaps worth trying is, export the fields to DOCX,
> > import that in Word, export it to DOC and see what kind of field that writes
> 
> Would it even work with real word2003?

probably not, that can't import DOCX, need a later version.
Comment 25 Michael Stahl (allotropia) 2015-06-15 08:43:02 UTC
* there was a related bug 76279 that was fixed in LO 4.4 but sadly not
  backported to 4.3 branch before EOL

* Word 2010 can read the fields after storing the attached files with LO 4.4.4

* LO cannot read the fields written by LO 4.4.4

* with the patch below LO can read the fields

* don't have Word 2003 so cannot tell if the patch helps in that case;
  this needs testing
  
please try out this patch if it helps with Word 2003 interop (should apply to libreoffice-4-4 or later branch with copy/pasting the "cherry-pick" command from gerrit page)

https://gerrit.libreoffice.org/16285
Comment 26 Michael Stahl (allotropia) 2015-06-15 08:45:19 UTC
Created attachment 116545 [details]
the 2nd attachment, exported with patch from previous comment
Comment 27 Yury 2015-06-15 10:16:52 UTC
Thank you very much, indeed, Michael!

Personally, having my local build with said hack, I'm not in a specal hurry to have that in a release, but anyway, nice to see someone cares. :)

I have some remarks/questions on the product of the patch:

1) In word2003 you can get either fields or the fields' codes displayed. Mine patch produces DOC in which fields codes (as shown by word2003) are starting with `{QUOTE "HIll 'NT"`. Your patch: just `{Hill 'NT}` in the field. Why so, and is it correct?

2) Could you try your patch with a file I'm attaching in a minute? I'd like to know how the bibliography comes through.
Comment 28 Yury 2015-06-15 10:18:14 UTC
Created attachment 116552 [details]
ODT with both biblio entry and bibliography index, text keys ('short names')
Comment 29 Michael Stahl (allotropia) 2015-06-19 14:22:53 UTC
Created attachment 116653 [details]
previous attachment exported with last patch
Comment 30 Michael Stahl (allotropia) 2015-06-19 14:23:53 UTC
Created attachment 116654 [details]
previous attachment exported with final patch
Comment 31 Michael Stahl (allotropia) 2015-06-19 14:30:28 UTC
in WW8 format there is both a integer field type and a textual field type in the field instruction; my first patch changes only the integer field type.

... and Word 2010 does not recognize the type of the field, if "Edit Field"
dialog unselected listbox entry indicates that.

in RTF/DOCX there is only the field instruction, which means that the missing QUOTE is probably a bug for those formats.

       <w:instrText>Hill 'NT</w:instrText>

so it's probably best to keep the integer field type and textual field type consistent, and add the textual field type for all formats.

should be FIXED on master
Comment 32 Commit Notification 2015-06-19 14:31:27 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=f56289ac6d7f3da7fd45dd431ce4c540aadcad56

tdf#88697: sw: make WW8 export of CITATION fields compatible with

It will be available in 5.1.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 33 Commit Notification 2015-06-19 16:27:15 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-5-0":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1bcd54e93245dfaea0a072493d8bab9e569bae93&h=libreoffice-5-0

tdf#88697: sw: make WW8 export of CITATION fields compatible with

It will be available in 5.0.0.1.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 34 Michael Stahl (allotropia) 2015-06-22 11:38:52 UTC
backport for 4.4 in gerrit:
https://gerrit.libreoffice.org/16410
Comment 35 Yury 2015-06-22 12:05:50 UTC
Thank a lot, Michael, for your involvement!
Now we are only a correct preview handling away from the "ideal" word exporter for scientists. :))
Comment 36 Commit Notification 2015-06-22 12:15:28 UTC
Michael Stahl committed a patch related to this issue.
It has been pushed to "libreoffice-4-4":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=a99a8aa1f6ec95a578a34f92aeab3523f090176b&h=libreoffice-4-4

tdf#88697: sw: make WW8 export of CITATION fields compatible with

It will be available in 4.4.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 37 Robinson Tryon (qubit) 2015-12-17 08:45:26 UTC Comment hidden (obsolete)