Bug 82173 - FILEOPEN: DOCX - "Footnote References" character styles incorrectly imported
Summary: FILEOPEN: DOCX - "Footnote References" character styles incorrectly imported
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
Inherited From OOo
Hardware: Other All
: medium normal
Assignee: Justin L
URL:
Whiteboard: interoperability target:6.0.0
Keywords: filter:docx, needsDevEval
: 82077 107177 (view as bug list)
Depends on:
Blocks: Footnote-Endnote DOCX-Styles
  Show dependency treegraph
 
Reported: 2014-08-05 05:00 UTC by Yousuf Philips (jay) (retired)
Modified: 2017-07-25 22:38 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
sample file (152.32 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2014-08-05 05:00 UTC, Yousuf Philips (jay) (retired)
Details
Word 2013 VS LibO 4.3.1 (109.36 KB, image/png)
2014-08-05 05:01 UTC, Yousuf Philips (jay) (retired)
Details
footnoteTest.docx: simple test with green, italic, superscript footnote character (13.68 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2017-05-20 16:45 UTC, Justin L
Details
part3 of the patch set - doesn't round-trip and breaks bug 82071 (2.04 KB, patch)
2017-06-03 13:15 UTC, Justin L
Details
docx with 'Heading 1' and 'heading 1' styles (13.14 KB, application/wps-office.docx)
2017-06-03 14:11 UTC, Yousuf Philips (jay) (retired)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yousuf Philips (jay) (retired) 2014-08-05 05:00:07 UTC
Created attachment 104046 [details]
sample file

Steps:
1) open attached file
2) the footnote numbers at the bottom of the page are as large as the text next to them

Tested on 4.3.1 on Linux and master on Windows.
Comment 1 Yousuf Philips (jay) (retired) 2014-08-05 05:01:44 UTC
Created attachment 104047 [details]
Word 2013 VS LibO 4.3.1
Comment 2 retired 2014-08-05 09:33:36 UTC
Guess you are saying, they should be smaller or upper case?

Could you describe in more detail what you want to be fixed here.

Otherwise I can confirm your finding: Word displays a small number while LO displays the number, the same size as the footnote text itself.

Thus NEW
Comment 3 Yousuf Philips (jay) (retired) 2014-08-05 13:35:25 UTC
Yes in Word the footnote numbers look like superscript, while in LibO it seems as if it a regular sized number.
Comment 4 Yousuf Philips (jay) (retired) 2015-02-23 16:47:12 UTC
Adding bug 82077 as that one is similar but with RTF.
Comment 5 Joel Madero 2015-02-23 17:34:43 UTC
@VMiklos - same issue as the rtf one just for docx. Code pointers welcome if you think that this can be an easy hack. Thanks in advance.
Comment 6 Ákos 2015-08-24 13:24:34 UTC
The bug still exist in LibreOffice 5.0.1.2 (windows 32 bit)

The footnote structure in Microsoft Word 2013:
- The footnote number: a first selectable character in the line, displayed as superscript
- One space character
- the footnote text
The all footnote is a one line text, and don't contain styles

The footnote structure in LO after import a docx document:
- the footnote number: a first character, imported as "Footnote Characters" character style, with normal character position
- one selectable but undisplayed character (code: 0x02 STX), imported as "footnote reference" character style
- one space character
- the footnote text, imported as "footnote text" paragraph stype

The footnote structure in LO ODT file, created in LO 5.0.1.2
- the footnote number: a first character, with "Footnote Characters" character style, with normal character position
- the footnote text, with "Footnote" paragraph style (the space between footnote number and footnote text is defined in "footnote" paragraph style)

How can you improve the import:
1. Can't convert the XML tag from source (remove the extra space): <w:r><w:t xml:space="preserve"> </w:t></w:r>
2. Remove the undisplayed character (this is imported probably instead of index character from word)
3. In the "Footnote Characters" character style select Superscript as character position
4. Import the footnote text as "footnote" paragraph style

In this way you convert the Word style footnote in LO style footnote
Comment 7 Robinson Tryon (qubit) 2015-12-14 06:04:23 UTC Comment hidden (obsolete)
Comment 8 Xisco Faulí 2017-04-18 09:00:37 UTC
*** Bug 107177 has been marked as a duplicate of this bug. ***
Comment 9 Xisco Faulí 2017-04-18 09:03:09 UTC
Still reproducible in

Version: 5.4.0.0.alpha0+
Build ID: 7635e0c1c7f821a1081f8e3868f641ae74a172d6
CPU threads: 4; OS: Linux 4.8; UI render: default; VCL: gtk2; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group
Comment 10 Justin L 2017-05-20 16:45:00 UTC
Created attachment 133420 [details]
footnoteTest.docx: simple test with green, italic, superscript footnote character

MSO's "footnote reference" character style doesn't get imported into the twin "Footnote Characters / Footnote Anchor" styles in Writer for DOCX.  (It works fine for DOC.)
Comment 11 Yousuf Philips (jay) (retired) 2017-05-21 00:40:27 UTC
(In reply to Justin L from comment #10)
> MSO's "footnote reference" character style doesn't get imported into the
> twin "Footnote Characters / Footnote Anchor" styles in Writer for DOCX.  (It
> works fine for DOC.)

Seems LO is importing the style correctly but not mapping it within the correct style names. LO is importing MS's "Footnote Reference" into "Default Paragraph Font > footnote reference" rather than into "Footnote Anchor" and "Footnote Characters". Below is the relevant xml code in attachment 133420 [details].

== /word/styles.xml ==

<w:style w:type="character" w:default="1" w:styleId="DefaultParagraphFont">
 <w:name w:val="Default Paragraph Font" />
 <w:uiPriority w:val="99" />
 <w:semiHidden />
</w:style>
<...>
<w:style w:type="character" w:styleId="FootnoteReference">
 <w:name w:val="footnote reference" />
 <w:basedOn w:val="DefaultParagraphFont" />
 <w:uiPriority w:val="99" />
 <w:semiHidden />
 <w:rsid w:val="00132573" />
 <w:rPr>
   <w:rFonts w:cs="Times New Roman" />
   <w:i />
   <w:color w:val="00FF00" />
   <w:kern w:val="2" />
   <w:vertAlign w:val="superscript" />
 </w:rPr>
</w:style>

== /word/footnotes.xml ==

<...>
<w:r>
  <w:rPr>
    <w:rStyle w:val="FootnoteReference" />
  </w:rPr>
  <w:footnoteRef />
</w:r>
<...>
Comment 12 Commit Notification 2017-05-29 07:40:06 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=e8714e3451282218e34d2ded472c9a5a44bd0bd2

related tdf#82173 writerfilter: ignore case when mapping style name

It will be available in 5.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 13 Justin L 2017-05-29 08:11:56 UTC
This bug is not (and isn't supposed to be) fixed yet.  So far, the only change is that the "in text" anchor now contains the "footnote reference" style defined in MSWord. I haven't figured out how to duplicate/link the "Footnote Characters" style to the "Footnote anchor" style.

The fix in comment 12 only enabled the existing mapping between styles.
Comment 14 Justin L 2017-06-03 13:15:49 UTC
Created attachment 133823 [details]
part3 of the patch set - doesn't round-trip and breaks bug 82071

Proposed patches to help, but not completely solve, this bug:
-https://gerrit.libreoffice.org/38306  charStyle XnoteReference->Xnote Characters
-https://gerrit.libreoffice.org/38372  apply char properties to footnote

The third patch seems like an obvious correction, except that it causes problems with the unit test for bug 82071. One way to avoid the problem is to avoid setting the stylename if !IsOpenField() - but that is a hack.

Another problem is that this solution is not round-trippable.
Comment 15 Yousuf Philips (jay) (retired) 2017-06-03 14:11:41 UTC
Created attachment 133825 [details]
docx with 'Heading 1' and 'heading 1' styles

(In reply to Commit Notification from comment #12)
> http://cgit.freedesktop.org/libreoffice/core/commit/
> ?id=e8714e3451282218e34d2ded472c9a5a44bd0bd2
> 
> related tdf#82173 writerfilter: ignore case when mapping style name

I think that ignoring case when mapping style names is a problem, as for example, its possible to have 'Heading 1' and 'heading 1' in the same word document and LO will trip over itself.
Comment 16 Justin L 2017-06-03 15:34:44 UTC
(In reply to Yousuf Philips (jay) from comment #15)
> I think that ignoring case when mapping style names is a problem, as for
> example, its possible to have 'Heading 1' and 'heading 1' in the same word
> document and LO will trip over itself.
Both of those were *already* manually mapped to Heading 1.  (You can look in my commit and see the duplicates that were already mapped to a single style.)

But certainly, if we find an example document where those are a problem, we can revert. The main reason I did a "generic" fix was because I saw so many existing examples, and it didn't seem reasonable to keep adding them one by one as more were discovered.
Comment 17 Yousuf Philips (jay) (retired) 2017-06-04 15:09:09 UTC
(In reply to Justin L from comment #16)
> Both of those were *already* manually mapped to Heading 1.  (You can look in
> my commit and see the duplicates that were already mapped to a single style.)

Yes that is true, but Word outputs 'Heading 1' as 'heading 1' in the xml, so any idea why we were also mapping 'Heading 1' on import? In your investigations, have you found documents which had 'Heading 1' instead of 'heading 1'?

> But certainly, if we find an example document where those are a problem, we
> can revert.

I was able to create a document in word 2010 and 2013 like this (attachment 133825 [details]), but when i reopen it, Word auto renames the user created 'heading 1' to 'Heading 11'.

<w:style w:type="paragraph" w:styleId="Heading1">
  <w:name w:val="heading 1" />
  <w:basedOn w:val="Normal" />
  <...>
</w:style>
<w:style w:type="paragraph" w:customStyle="1" w:styleId="heading10">
  <w:name w:val="heading 1" />
  <w:basedOn w:val="Heading1" />
  <...>
</w:style>

There would need to be some error checking to see if a style has already been created and another style is trying to overwrite it on import. The w:customStyle attribute would also be helpful in finding out whether a style is a built-in style or user-defined style.

> The main reason I did a "generic" fix was because I saw so many
> existing examples, and it didn't seem reasonable to keep adding them one by
> one as more were discovered.

Where can i find these examples?
Comment 18 Justin L 2017-06-05 06:43:37 UTC
(In reply to Yousuf Philips (jay) from comment #17)
> Yes that is true, but Word outputs 'Heading 1' as 'heading 1' in the xml, so
> any idea why we were also mapping 'Heading 1' on import?
I only looked at the git history, and saw that these were were present already in 2008. There haven't been any recent changes to the list.
> Where can i find these examples?
I only meant many examples in the code - not in .docx files.
Comment 19 Commit Notification 2017-06-06 12:12:41 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=707eb4db1918658e0c2c2c2033c6a69f80c4eafd

tdf#82173 writerfilter: charStyle XnoteReference->Xnote Characters

It will be available in 5.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 20 Commit Notification 2017-06-09 05:35:41 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=fdfdea4d5af51a68f2d497cc5c3359d74c385fd5

tdf#82173 writerfilter: apply char properties to footnote

It will be available in 5.5.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 21 Justin L 2017-06-09 12:33:25 UTC
(In reply to Yousuf Philips (jay) from comment #15)
> I think that ignoring case when mapping style names is a problem.
I agree. It appears that many of these mappings had been ineffective since, as you noted, Microsoft tends to export the lowercase form of the stylename, but most of the mapping are in Titlecase.

What I plan to do is leave the commit in place for a couple of months to see if QA notices that it fixes anything (as it should). After that I will revert it.

One identified problem is endnotes. gerrit.libreoffice.org/38605 writerfilter: map endnote text to Endnote, not Endnote Symbol
Comment 22 Yousuf Philips (jay) (retired) 2017-06-10 14:59:33 UTC
(In reply to Justin L from comment #21)
> (In reply to Yousuf Philips (jay) from comment #15)
> > I think that ignoring case when mapping style names is a problem.
> I agree. It appears that many of these mappings had been ineffective since,
> as you noted, Microsoft tends to export the lowercase form of the stylename,
> but most of the mapping are in Titlecase.

Actually MS titlecases most of their styles in XML, but only lowercases headings and table of content entries, though titlecases headings and capitalizes TOC in the UI. Here are the lowercase entries from a Word 2010 file.

<w:lsdException w:name="heading 1">    UI: Heading 1
<w:lsdException w:name="heading 2">    UI: Heading 2
<w:lsdException w:name="heading 3">    UI: Heading 3
<w:lsdException w:name="heading 4">    UI: Heading 4
<w:lsdException w:name="heading 5">    UI: Heading 5
<w:lsdException w:name="heading 6">    UI: Heading 6
<w:lsdException w:name="heading 7">    UI: Heading 7
<w:lsdException w:name="heading 8">    UI: Heading 8
<w:lsdException w:name="heading 9">    UI: Heading 9
<w:lsdException w:name="toc 1"/>       UI: TOC 1
<w:lsdException w:name="toc 2"/>       UI: TOC 2
<w:lsdException w:name="toc 3"/>       UI: TOC 3
<w:lsdException w:name="toc 4"/>       UI: TOC 4
<w:lsdException w:name="toc 5"/>       UI: TOC 5
<w:lsdException w:name="toc 6"/>       UI: TOC 6
<w:lsdException w:name="toc 7"/>       UI: TOC 7
<w:lsdException w:name="toc 8 />       UI: TOC 8
<w:lsdException w:name="toc 9"/>       UI: TOC 9
<w:lsdException w:name="caption"/>     UI: Caption
Comment 23 Justin L 2017-06-10 16:09:31 UTC
(In reply to Yousuf Philips (jay) from comment #22)
> Actually MS titlecases most of their styles in XML
footnote reference and endnote reference are also lowercase.
Comment 24 Commit Notification 2017-06-21 07:55:44 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=0f4038abcd3d5f93847f7f27ffbb990f6a19c4ba

tdf#82173 writerfilter: copy Xnote Characters -> Xnote anchor

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 25 Yousuf Philips (jay) (retired) 2017-06-23 09:29:50 UTC
Your latest patch got me thinking about how well we handle exporting of footnotes, especially if LO's two footnote styles are different.
Comment 26 Yousuf Philips (jay) (retired) 2017-06-25 12:33:13 UTC
(In reply to Justin L from comment #23)
> footnote reference and endnote reference are also lowercase.

True. Also footnote text and endnote text and alot more according to attachment 131189 [details].
Comment 27 Commit Notification 2017-06-29 02:38:49 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=803a17533f25d9174c6a19aa913a6713980c193d

revert related tdf#82173 writerfilter: ignore case when mapping styles

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 28 Commit Notification 2017-06-29 07:59:27 UTC
Stephan Bergmann committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1091744caf4f5509a67b5e5fc8ba2251ef5a6a18

Revert "revert related tdf#82173 writerfilter: ignore case when mapping styles"

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 29 Commit Notification 2017-06-29 19:09:44 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=79225b5a70595740af500485253a4b2084e940f8

related tdf#82173 writerfilter: ensure Xnote para-style round-tripping

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 30 Commit Notification 2017-06-30 19:37:32 UTC
Justin Luth committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=fb39062ed958c2a5df90c0aff7d873746122067c

revert related tdf#82173 writerfilter: ignore case when mapping styles

It will be available in 6.0.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 31 Justin L 2017-07-01 00:08:37 UTC
*** Bug 82077 has been marked as a duplicate of this bug. ***
Comment 32 Justin L 2017-07-01 00:36:07 UTC
Marking this bug as fixed since the description problem is resolved. Other matters brought up as discussion items should become new bugs since this one has gotten pretty cluttered.

(In reply to Ákos from comment #6)
> How can you improve the import:
> 1. Can't convert the XML tag from source (remove the extra space): <w:r><w:t
> xml:space="preserve"> </w:t></w:r>
> 2. Remove the undisplayed character (this is imported probably instead of
> index character from word)
somewhat related to bug 71984 and bug 105095
> 3. In the "Footnote Characters" character style select Superscript as
> character position
Done
> 4. Import the footnote text as "footnote" paragraph style
Done
Comment 33 Justin L 2017-07-25 22:38:19 UTC
some changes needed in RTF import to allow it to work there also.  See https://gerrit.libreoffice.org/40430 or bug 108949