Bug 64232 - FILEOPEN: [FORMATTING] [DATALOSS] Importing DOCX with default font 'Times New Roman' - shown as different font in Writer
Summary: FILEOPEN: [FORMATTING] [DATALOSS] Importing DOCX with default font 'Times New...
Status: VERIFIED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
4.0.2.2 release
Hardware: All All
: high major
Assignee: Jacobo Aragunde Pérez
URL:
Whiteboard: BSA target:4.3.0
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-05 08:49 UTC by Adam CloudOn
Modified: 2020-11-05 10:27 UTC (History)
9 users (show)

See Also:
Crash report or crash signature:


Attachments
DOCX with single text with default 'Times New Roman' font (11.30 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-05-05 08:49 UTC, Adam CloudOn
Details
LibreOffice Font Bug - Screenshot (27.29 KB, image/png)
2013-05-05 08:51 UTC, Adam CloudOn
Details
File with Hebrew text example of the problem. Only one word uses a non theme font that works (16.70 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-07-18 14:17 UTC, Shai Petel
Details
A fresh new docx file created in MS Word 2010 (12.65 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2013-10-17 07:44 UTC, Kevin Suo
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Adam CloudOn 2013-05-05 08:49:07 UTC
Created attachment 78873 [details]
DOCX with single text with default 'Times New Roman' font

Problem description: 
A DOCX file was created in Word 2013 with a single sentence (no font was selected, so default font in Word is 'Times New Roman').
When opening the DOCX in LibreOffice - it looks like a different font (defiantly not 'Times New Roman').
In addition, the font combo-box appears empty.

In addition - when saving the file back to a new DOCX - Word cannot open the file (file is corrupt). When Word tries to recover the corrupt file - it opens with the font 'Calibri' selected for the text.

Steps to reproduce:
1. Create a DOCX in Word without changing font (or download the attached document)
2. Open the DOCX in LibreOffice 

Current behavior:
- No font is selected in 'font combo-box'
- Font rendered is not 'Times New Roman'
- When saving to DOCX - file is corrupt

Expected behavior:
- Selected font in 'font combo-box' should be 'Times New Roman'
- Font rendered should be 'Times New Roman'
- When saving to DOCX - should not corrupt the file

Operating System: Windows 8
Version: 4.0.2.2 release
Comment 1 Adam CloudOn 2013-05-05 08:51:10 UTC
Created attachment 78874 [details]
LibreOffice Font Bug - Screenshot
Comment 2 Jorendc 2013-05-05 09:08:37 UTC
@Joel: Sorry to ping you in your vacation. Do you mind to test with Office 2013?

Thanks!
Joren
Comment 3 Joel Madero 2013-05-06 04:36:24 UTC
the file is not corrupt - Microsoft Office only supports an older version of open standards (ODF), so when you try to save with the new standards which is default for LibreOffice it doesn't know what to do and says there is an error - quite tricky on Microsoft's part to say the least - especially if MSO 2013 is still doing this. This was well known with 2007 and 2010, it's unfortunate that they are still trying to blame their lack of support on the file itself (thus blaming corrupt file, vs. "sorry we don't support this").

@ Joren - sure I'll test, I still haven't actually installed MSO 2013 - just sitting there collecting dust like 2010 is ;) But I will try to install or test on a system that has it already - give me a day or two :)
Comment 4 Adam CloudOn 2013-05-06 07:28:31 UTC
Maybe I was misunderstood or maybe I didn't understand @Joel's comment..

I did not mean saving DOCX as ODF and trying to open it in MS Word.
I meant working only with DOCX files.

Open DOCX in LO - save it as NEW.DOCX - open new NEW.DOCX in Word => Boom, corrupt.
Why would Word say the DOCX is corrupted unless LO somehow had a bug exporting?
Comment 5 Joel Madero 2013-05-06 13:33:49 UTC
indeed this would be a much more serious problem, I misunderstood, will attempt to test today
Comment 6 Joel Madero 2013-05-08 16:13:49 UTC
Windows 7
LibreOffice Version 4.0.2.2 release

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Confirmed

New (confirmed)
Major - not good, seems like MSO has changed something about their docx format....annoying from our end but still we shouldn't be making MSO think that we broke their files
High


In the future please try to keep bug reports separate :) This is in fact 2 related bugs (and possibly the same root cause but I'm not sure). 


For now will keep as one and cc writer expert.


Thanks for reporting!
Comment 7 Gerry 2013-05-08 20:21:22 UTC
@Adam: Is it possible that your original file has a formatting error? Opening your test docx-file in Word 2010 I see that the font name is 

"Times New Roman (Headings CS)"

I do not exactly know how it came in the Word 2013 file, but I think LibreOffice replaces this unknown font with something else. Of course, it should replace it with "Times New Roman" instead of a completely different font.

On the other hand, could you please investigate with your Word 2013 installation how this "...(Heading CS)" came into the font name?
Comment 8 Joel Madero 2013-05-08 21:06:59 UTC
Gerry - great catch

Marking as NEEDINFO.

I tried reproducing with a blank document and could not. My default font is Calibri(body). When I save the document and open with LibreOffice it opens and has font set as Calibri.

I then saved a copy of the document from LibreOffice to docx format (2007/2010) -- probably should add 2013 to this -- and opened the document successfully with Word with no issues.


I think this is a problem with that font choice, and it's likely that it's notourbug. Please confirm this and then we can mark the bug as INVALID.


Thanks!
Comment 9 Adam CloudOn 2013-05-09 06:43:12 UTC
I have investigated what happens here, and this is what I *think* happens:

The DOCX file I have attached has a tag called :

<w:rFonts w:asciiTheme="majorBidi" w:hAnsiTheme="majorBidi" w:cstheme="majorBidi"/>

this tag means 'run fonts', and what I understand from investigation is that it determines the default font used for the run (in case no specific font is set).

There are 2 ways of doing so:
setting the 'ascii', 'hAnsi', 'cs' and 'ea' attributes 
OR
setting the 'asciiTheme', 'hAnsiTheme', 'cstheme' and 'eaTheme' attributes.

for example:
- if you set 'w:ascci="Tahoma"' than this means that any ascii text in the run that has no font set to it will be rendered in Tahoma.
- if you set 'w:cs="Tahoma"' than this means that any complex-script text (e.g. Hebrew or Arabic) in the run that has no font set to it will be rendered in Tahoma.

On the other hand, you have 'asciiTheme' and 'cstheme' and so on, and this is where it gets tricky.
'w:csTheme=majorBidi' means that Word will go to the 'settings.xml' file - look for the 'w:themeFontLang' tag, and look for the 'bidi' attribute.
In the attached file this is the tag:
     <w:themeFontLang w:val="en-US" w:eastAsia="zh-CN" w:bidi="he-IL"/>

Word will then go to the 'theme1.xml' file and look for the 'majorFont' tag.
After finding it - it will look for the 'Hebr' tag (because bidi was 'he-IL' which is Hebrew). There is will see that the typeface for 'majorFont' in the language 'Hebrew' is "Times New Roman", and this means that for every complex-script text in the run - it will be rendered in Times New Roman.

The same goes for 'asciiTheme', with one difference - all the text in the run that is simple ascii - will be rendered in 'Times New Roman'.


How can you verify this ?

If you go to the 'theme1.xml' file and change the typeface for 'majorFont' & 'Hebr' to 'Tahoma' you will see Word shows the font of the text in 'Tahoma'.


My conclusion after this (long) comment is that there is probably a problem in the mechanism that imports the theme & settings information from the DOCX.

In addition - it seems that LO sets these attributes to "" (empty string) when it saves back to DOCX, which Word probably doesn't like...

I hope this helps.
Comment 10 Shai Petel 2013-07-18 14:12:11 UTC
I can confirm this happens to me with Office 2013, and I think I know why and what was changed.
Hope this can help:

I cracked open the docx file and noticed this behaviour,
text using any one of the theme fonts (current them my word was set to) is rendered incorrectly.

Any words using a font outside of my theme - was rendered correctly.

I noticed in the document.xml inside the docx file, that fonts outside of my theme had the font name in them, example:
<w:rFonts w:ascii="FrankRuehl" w:hAnsi="FrankRuehl" w:cs="FrankRuehl"/>

While fonts from my theme, had one of these:
<w:rFonts w:asciiTheme="majorBidi" w:hAnsiTheme="majorBidi" w:cstheme="majorBidi"/>
or <w:rFonts w:asciiTheme="minorBidi" w:hAnsiTheme="minorBidi"/>

Now, when you go to sub folder "theme" and open "theme1.xml" you can find the definition of the major and minor theme to use.

There is a font scheme set with the majorFont table and minorFont table, this is where it gets complex: the major or minor font is not a single font, but a list of fonts per language. So, each language has it's own major/minor font, and one of them is set to the default.


I believe this is a new format change in office 2013, which omits the font name from the document and points to the theme file.

I vote for making this high priority, since with people starting to work on office 2013 more and more - this will become a huge disadvantage for using LibreOffice.

I can help with providing office 2013 test machine if you guys need me to test something.
Comment 11 Shai Petel 2013-07-18 14:17:11 UTC
Created attachment 82600 [details]
File with Hebrew text example of the problem. Only one word uses a non theme font that works

Note in this file example, each word uses different font from the default theme.
They should be Ariel or Times New Roman - instead they appear wrong.

Only the word in the middle of the second row uses a custom non-theme font and this is the only one displayed correctly. It uses FrankRuehl font.
Comment 12 Adam CloudOn 2013-07-18 14:43:26 UTC
(In reply to comment #10)
> I can confirm this happens to me with Office 2013, and I think I know why
> and what was changed.
> Hope this can help:
> 

Your analysis looks exactly like mine, so I guess we are both right ...
LibreOffice does not support 'Themes' for now... 
I agree this should be supported and vote '+1' to mark it as 'major'.
Comment 13 Joel Madero 2013-07-18 19:57:47 UTC
I am relatively sure that the theme issue is being handled by another bug report and thus this is likely a dupe.

Adding Michael as I believe he knows something about it.
Comment 14 Michael Meeks 2013-07-19 15:03:22 UTC
I know nothing, beyond the fact that we should get a model for theming information sorted, imported and exported again :-) [ and for this one, I'm rather interested in adding the feature to ODF too - though no need for Adam to do that ].
Comment 15 Shai Petel 2013-07-22 16:09:21 UTC
Anything I can provide to help?

Any info on theme structure/format or testing?

Let me know.
Comment 16 Kevin Suo 2013-10-17 07:44:51 UTC
Created attachment 87781 [details]
A fresh new docx file created in MS Word 2010

This file was created in MS Word 2010. It shows "Calibri" for western font and "宋体" (Simsun) for Chinese font.

But, it shows "Calibri" for western font and "DejaVu Sans" for Chinese font (fallback to Wenquanyi Zenhei as DejaVu Sans is actually not a Chinese font), rather than "宋体" (Simsun). Note that this is consistent with the default LO template for writer, because when create a new file in LO it sets "Calibri" for western and "DejaVu Sans" for Chinese as default.

When decompress the DOCX file, I see the tag "<w:rFonts w:hint="eastAsia"/>" in "/word/document.xml", and I see "<w:font w:name="Calibri">" and "<w:font w:name="宋体"><w:altName w:val="SimSun"/>" in "/word/fontTable.xml".

So, maybe LO has dismissed the default docx fontTable settings and applied the LO default font, when no direct font was set.

More ever, setting "DejaVu Sans" as the default Chinese font in the default template is another bug. Maybe I should try to file another bug report?
Comment 17 Kevin Suo 2013-10-17 07:50:44 UTC
*** Bug 65117 has been marked as a duplicate of this bug. ***
Comment 18 Maxim Monastirsky 2013-10-17 10:46:49 UTC
(In reply to comment #16)
> This file was created in MS Word 2010. It shows "Calibri" for western font
> and "宋体" (Simsun) for Chinese font.
It opens as 'SimSun' font with master build (4.2.0.0.alpha0+ Build ID: cc2a405915e82c4b332dd25457f76704dc536d7f TinderBox: Win-x86@39, Branch:master, Time: 2013-10-15_15:51:52), while the current bug is still reproducible with that build, so not related at all. The reason is that the current bug is about Office *2013* 'Themes', while you talk about Office *2010* file.
Comment 19 Commit Notification 2013-12-04 19:05:59 UTC
Jacobo Aragunde Perez committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=0fa60a7f5c1d3510c4fe1ea3d2a51527baf102bc

fdo#64232: Add font theme info to CharInteropGrabBag



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 20 Commit Notification 2013-12-04 19:06:19 UTC
Jacobo Aragunde Perez committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=9e47df8fd7c3cb1dcf556e009cec2d37b928d9b0

fdo#64232: Save font theme attributes back to the docx



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 21 Commit Notification 2013-12-04 19:06:36 UTC
Jacobo Aragunde Perez committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=458b89b303145085a1745fe408f0e860686d7220

fdo#64232: Fix and preserve eastAsiaTheme attribute



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 22 Commit Notification 2013-12-04 19:06:54 UTC
Jacobo Aragunde Perez committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=983002475fba1879fd00c75417342be55153b797

fdo#64232: Save font theme attributes in rPrDefault



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 23 Commit Notification 2013-12-04 19:07:12 UTC
Jacobo Aragunde Perez committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=13ce74fd9b5f7b9ea9d3dab34eed27a63aae5468

fdo#64232: Unit test for font theme attributes preservation



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 24 Jacobo Aragunde Pérez 2013-12-04 19:10:45 UTC
The set of patches that I've just pushed addresses the export of the theme font information properly for these cases:

* Direct font format (paragraphs containing a rPr tag with asciiTheme, csTheme, etc. set)
* Default font format (contained at styles.xml, inside rPrDefault tag)
* Style font format (style definitions containing a rPr tag with asciiTheme, csTheme, etc. set)

I think there are no other cases of font theming happening.

Pending issues from the ones mentioned in this bug report:

* Still no font is selected in font combo-box when opening the provided docx sample file.
  * Obviously, font rendered is not correct either.

* "DejaVu Sans" is still used for Chinese font rather than SimSun. What's worse, this is not preserved on export because SimSun font disappears from fontTable.xml file.

I might open a new bug for the second issue, because it doesn't seem related with theme font attributes preservation.
Comment 25 Jacobo Aragunde Pérez 2013-12-05 11:06:16 UTC
About the problem with the fonts being empty:

When a docx is opened, a theme font map is build in ThemeTable.cxx. This font table is built from the following elements in the theme1.xml file:

<a:fontScheme name="Office">
  <a:majorFont>
    <a:latin typeface="Cambria"/>
    <a:ea typeface=""/>
    <a:cs typeface=""/>
    ...

As you can see, the entries for eastAsia and complex script are empty and that's why there is no font for these types in the table.

I don't know what's exactly the purpose of these fields, because themes in Office seem to work like Adam explained in comment #9. I'll try to mimic that behaviour in LibO.
Comment 26 Jacobo Aragunde Pérez 2013-12-09 09:38:55 UTC
About the problem of the SimSun font being replaced with DejaVu: the cause is LibreOffice not preserving themeFontLang information in settings.xml file:

  <w:themeFontLang w:val="en-US" w:eastAsia="zh-CN"
  w:bidi="he-IL" />

The conclusion is that both pending problems are related and solving one will fix the other too.
Comment 27 Commit Notification 2013-12-11 16:45:45 UTC
Jacobo Aragunde Perez committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1835074d525d12629008f8a6d5ed27402d18f4b3

fdo#64232: fix theme fonts application



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 28 Commit Notification 2013-12-11 16:46:02 UTC
Jacobo Aragunde Perez committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=ece66b11bd3d294eb27f185c1513744fe28ca523

fdo#64232: export w:themeFontLang setting to docx



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.
Comment 29 Jacobo Aragunde Pérez 2013-12-18 09:03:40 UTC
All the problems mentioned in this bug report have been fixed :)
Comment 30 Maxim Monastirsky 2013-12-18 16:00:03 UTC
Fix verified with a master build with Build ID: 4c539fac018dfd44cd8db52161a8cb930c627da7. Thanks!