Bug Hunting Session
Bug 65353 - FILESAVE: LibreOffice embeds fonts defined in styles (e.g. CJK and CTL fonts) but arent used in the document
Summary: FILESAVE: LibreOffice embeds fonts defined in styles (e.g. CJK and CTL fonts)...
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
4.1.0.0.beta1
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.2.0
Keywords:
: 70669 92701 99654 113570 113656 (view as bug list)
Depends on:
Blocks: Fonts-Embedded
  Show dependency treegraph
 
Reported: 2013-06-04 14:14 UTC by pierre-yves samyn
Modified: 2019-10-10 10:43 UTC (History)
30 users (show)

See Also:
Crash report or crash signature:


Attachments
sample of embedded fonts in 4.4.0.3 (6.11 MB, application/vnd.oasis.opendocument.text)
2015-02-11 07:28 UTC, putt1ck
Details
empty document, 17MB (17.16 MB, application/vnd.oasis.opendocument.text)
2018-03-21 20:10 UTC, Dmitry
Details

Note You need to log in before you can comment on or make changes to this bug.
Description pierre-yves samyn 2013-06-04 14:14:45 UTC
Hello

Steps to reproduce
1. File> New> Text document
2. Type "test"
3. Check File> Properties> Font> Embed fonts in the document
4. Save the document (test.odt for instance)
5. Unzip the archive

As expected we have a Fonts subfolder in the archive.

What I do not understand is that, whatever the
fonts used in the document, the fonts Mangal and
Mangal Bold are incorporated.

Platform: Windows XP & 7 64bits

Version: 4.1.0.0.beta1+
Build ID: 36b5efd733c68c67bb4a1b9489a775aed1fe98e
TinderBox: Win-x86_9-Voreppe, Branch:libreoffice-4-1, Time: 2013-05-31_17:43:42

Regards
Pierre-Yves
Comment 1 Jacques Guilleron 2013-06-04 15:14:17 UTC
Hello Pierre-Yves,

I confirm with 
LO 4.2.0.0.alpha0+ Build ID: 60790b3f0ccc1779bcff2ddcc278a9027aedabe
Windows 7 Home Premium

Have a nice day,

Jacques Guilleron
Comment 2 Michael Meeks 2013-06-04 17:03:21 UTC
Lubos - any thoughts on this ?
Comment 3 Luboš Luňák 2013-07-24 13:29:46 UTC
Well, they are used, in a way, the are used by certain styles. And e.g. SwXMLFontAutoStylePool_Impl::SwXMLFontAutoStylePool_Impl() writes out all fonts referenced by the document, even if it's just a fallback font for asian characters.
The ODF export filters would need to get smarter about this.
Comment 4 pierre-yves samyn 2013-07-25 12:24:18 UTC
Hello

Thank you for replying

(In reply to comment #3)
> Well, they are used, in a way, the are used by certain styles. And e.g.
> SwXMLFontAutoStylePool_Impl::SwXMLFontAutoStylePool_Impl() writes out all
> fonts referenced by the document, even if it's just a fallback font for
> asian characters.

I see...

> The ODF export filters would need to get smarter about this.

The problem is not that annoying but it would be great... :)

The API provides the isInUse method for the interface XStyle, may be ?
Or at least check if the "bi-directional writing" option is enabled ?

Regards
Pierre-Yves
Comment 5 Maxim Monastirsky 2013-11-27 10:37:32 UTC
*** Bug 70669 has been marked as a duplicate of this bug. ***
Comment 6 Maxim Monastirsky 2014-02-27 09:36:16 UTC
*** Bug 75553 has been marked as a duplicate of this bug. ***
Comment 7 Yousuf Philips (jay) (retired) 2014-05-31 23:25:12 UTC
Testing on Linux Mint, i've observed that the saving of the document takes ~10 seconds, even with the document only had 'Hi' in it. So this is a time waster as LibO is unusable at that moment.

And things are getting worse with each version as a 4.1 document was 23.6mb, 4.2 document was 24.7 and now a 4.3 beta document is a whopping 32mb. Hopefully once this is dealt with, my requesting for an easy means to know which fonts are available in a document will be easy to accomplish [bug 78186].
Comment 8 Jean-Baptiste Faure 2014-06-01 19:45:51 UTC
(In reply to comment #7)
> Testing on Linux Mint, i've observed that the saving of the document takes
> ~10 seconds, even with the document only had 'Hi' in it. So this is a time
> waster as LibO is unusable at that moment.

I just tried with LO 4.3.0.0.beta1+ on Ubuntu 14.04 x86-64 with 3 words in the document. The saving of the document is less than 1 second and the file takes 3.2 mb. If I open the odt archive, the fonts directory takes 6 mb and contains 9 fonts (ttf files).

Best regards. JBF
Comment 9 Michael Meeks 2014-06-02 09:27:06 UTC
Andras - potentially one to track for cp-4.2 ? =)
Comment 10 Yousuf Philips (jay) (retired) 2014-06-02 14:25:44 UTC
(In reply to comment #8)
> I just tried with LO 4.3.0.0.beta1+ on Ubuntu 14.04 x86-64 with 3 words in
> the document. The saving of the document is less than 1 second and the file
> takes 3.2 mb. If I open the odt archive, the fonts directory takes 6 mb and
> contains 9 fonts (ttf files).

It is no doubt different people will have different experiences with this depending on what fonts are installed on their system. With me the 4.1 document saved 14 fonts at 39.1mb, 4.2 saved 16 fonts at 41.1mb and 4.3 saved 15 fonts at 56.1mb. This test was on the same system with the same fonts installed on the system. I thought it maybe due to the user profile, so i reset it in 4.3 and the resulting file had 13 fonts at 54.8mb with a file size of 32.9mb. So ultimately the delay in the UI will depend on the number of fonts its saving.

With a stock Ubuntu 14.04 32-bit, the 'Hi' document is 5.5mb with 13 fonts in it at 10.4mbs and halted the UI for 2 to 4s. This test was on my laptop which has an Intel Core 2 @ 1.3ghz and 2.5gb of ram.
Comment 11 Owen Genat (retired) 2014-09-14 14:25:40 UTC
There appear to be multiple problems with font embedding:

a) Fonts used in default (common) / automatic / pre-defined styles appear to be embedded in a non-obvious manner.

b) Some fonts appear to be embedded twice (refer bug 83675). In the example XLSX in that report I did not reproduce this, but under GNU/Linux using v4.3.1.2 a new document containing "a" embeds two copies each of FreeSans and FreeSansOblique e.g.:

<style:font-face style:name="FreeSans1" svg:font-family="FreeSans" style:font-family-generic="swiss">

<style:font-face style:name="FreeSans" svg:font-family="FreeSans" style:font-family-generic="system" style:font-pitch="variable">

c) Some fonts in use are not being embedded at all (again, refer bug 83675).

Is this particular report only going to focus on (a) or expanded to cover (b) and (c) also? Just after some clarity about how to treat seemingly related reports. Thanks.
Comment 12 pierre-yves samyn 2014-09-15 07:53:10 UTC
Hi 

(In reply to comment #11)
> There appear to be multiple problems with font embedding:
> ...
> Is this particular report only going to focus on (a) or expanded to cover
> (b) and (c) also? 

Several bugs (not yet assigned) are related to this feature (is there not also the multiplication of files?). I do not know if a "meta" bug was created or whether to create one. I do not enough know our use of BZ to have an opinion...

I would like to get more competent than me on this matter :)

Regards
Pierre-Yves
Comment 13 putt1ck 2015-02-11 07:28:37 UTC
Created attachment 113302 [details]
sample of embedded fonts in 4.4.0.3

Attached file is from simple template with only custom styles applied throughout, all of which use the same font (Crimson). The resulting file does contain Crimson and its variants, but also FreeSans and its variants (twice), Times New Roman and variants, Arial and variants and Droid Sans Fallback, a total of 17 embedded fonts where only 4 were needed.

These extra fonts are not used in the document, nor referenced in the style hierarchy; opening the file up they are only referenced in content.xml & styles.xml, and that's in both the attached file and in a version without embedded fonts. Therefore I assume font embedding is being done by simply looking for fonts referenced in the file.

However all the fonts referenced but not used are contained in one small part at the beginning of content.xml within the tag office:font-face-decls, while only the font actually used is referenced after that tag - so cannot the embedder exclude fonts referenced within that tag?

Alternatively, while the fonts referenced in that tag are referenced with style:font-face style:name= and svg:font-family=, fonts actually in use seem to be referenced within the document with style:font-name= so perhaps could be used as a reference when looking for fonts to embed.
Comment 14 Maxim Monastirsky 2015-07-13 14:14:39 UTC
*** Bug 92701 has been marked as a duplicate of this bug. ***
Comment 15 geeker 2015-08-25 20:39:19 UTC
This bug is reproduced in Version 5.0.0.5 in Linux and Windows.
Comment 16 martin_hosken 2016-01-07 03:44:53 UTC
Perhaps a way forward is to add a further config option that says: only embed fonts used by text in the document. The aim being that the document is viewable and to a great extent editable. What is gained is that it minimises the fonts embedded while giving users the expected results. If someone really wants everything, they just don't click that option!
Comment 17 Michael Meeks 2016-01-07 09:22:54 UTC
Happy to have another option if it's useful to you Martin; at least for now - I think its worth focusing on this as a bug; no point in embedding fonts (several times?) for styles etc. that are not used in the text =) looks like a normal bug to me - which we should fix (perhaps with the extra option there).
Comment 18 Yousuf Philips (jay) (retired) 2016-01-19 09:39:45 UTC
@Martin: No i dont think we need an extra option, as the option to embed fonts is only supposed to embed fonts used in the doc.
Comment 19 Maxim Monastirsky 2016-02-15 12:31:07 UTC
*** Bug 97874 has been marked as a duplicate of this bug. ***
Comment 20 Yousuf Philips (jay) (retired) 2017-10-05 04:43:27 UTC
*** Bug 99654 has been marked as a duplicate of this bug. ***
Comment 21 Yousuf Philips (jay) (retired) 2017-10-05 05:55:33 UTC
So i see that whatever fonts are set to be used for asian and ctl language support, even though the fonts arent actually set to any text but are set in various styles like the default paragraph and graphic styles, they will be saved within the document.

On Windows 8.1, for asian support, Microsoft YaHei and SimSun are selected[1], and it embeds SimSun, but strangely didnt for YaHei, but corrupts the SimSun 17.4mb file as it removes the first 32 bytes from its beginning. For ctl support, Mangal is set and it embeds it totaling 377kb. A resulting blank document with these fonts takes up 11mb.

On Linux Mint 18 (clean installation), for asian support, WenQuanYi Micro Hei is selected and takes up 4.9mb, while for ctl support, Lohit Devanagari is selected and takes up 137.5k, as it saves two copies of the same file. A resulting blank document with these fonts takes up 3.8mb.

On Linux Mint 18 (with various MS fonts installed), for asian support, Microsoft YaHei and SimSun are selected and take up 51.8mb, while for ctl support, Lucida Sans is selected and takes up 127kb. A resulting blank document with these fonts takes up 3.8mb. A resulting blank document with these fonts takes up 32.9mb.

Separate from the asian and ctl support, LO saves Liberation Sans and Serif for latin support which take up 2.8mb.

@Olivier: in your duplicate bug, what do you have set for latin, asian and ctl fonts to get the whopping 94mb file you mentioned.

[1] Tools > Options > LibreOffice Writer > Basic Fonts (Asian)
Comment 22 Olivier Hallot 2017-10-05 11:50:08 UTC
(In reply to Yousuf Philips (jay) from comment #21)
(snip)
> @Olivier: in your duplicate bug, what do you have set for latin, asian and
> ctl fonts to get the whopping 94mb file you mentioned.
> 
> [1] Tools > Options > LibreOffice Writer > Basic Fonts (Asian)

Tested again in 5.4.1 (Kubuntu). 
A simple odt file with just "Hi" inside
- No font embedded: 8k
- Fonts embedded: 19M

No Asian, no CTL set for my doc (UI=pt-BR).

Basic (Western) fonts are Liberation Serif for all entries (Default, ..., Index).
Comment 23 Yousuf Philips (jay) (retired) 2017-10-05 15:05:35 UTC
(In reply to Olivier Hallot from comment #22)
> Tested again in 5.4.1 (Kubuntu). 
> A simple odt file with just "Hi" inside
> - No font embedded: 8k
> - Fonts embedded: 19M

Open up the 19mb file and see which fonts are embedded by

1. renaming the .odt to .zip
2. unzipping the file
3. check the 'Fonts' directory

the first set of fonts are normally CTL, then the next set is Liberation fonts and the last set is asian fonts

> No Asian, no CTL set for my doc (UI=pt-BR).

Yes of course it isnt set, but find out which asian and ctl fonts LO is using by

1. go into tools > options > language settings > languages and check the asian and complex text layout checkboxes
2. open writer
3. go into tools > options > writer and see what is listed in Basic Fonts (Asian) and Basic Fonts (CTL)

> Basic (Western) fonts are Liberation Serif for all entries (Default, ...,
> Index).

Yes this never changes unless a user changes it, as LO always bundles Liberation fonts.
Comment 24 Adolfo Jayme 2017-11-01 18:58:59 UTC
*** Bug 113570 has been marked as a duplicate of this bug. ***
Comment 25 Maxim Monastirsky 2017-11-06 22:25:16 UTC
*** Bug 113656 has been marked as a duplicate of this bug. ***
Comment 26 James 2017-11-08 11:26:17 UTC
As an author I often have documents of 400 plus pages, and as they evolve some have attracted numerous font references, resulting in very large files with a number of unwanted embedded fonts. I work in Linux and my manual solution is thus.

Save the document as odt with the embed fonts button unchecked
Change file suffix to zip
open with archive.
examine the file styles.xml
for each unwanted " <style:font-face style:name="..." line at the top of the file do a global search & replace substituting the name of a required font as shown in the list.
remove duplicate " <style:font-face style:name="... " lines created.
save
change the suffix back to odt
Load in Libreoffice, check embed fonts and save.

You now have a smaller file without the unwanted fonts.
This is a dirty solution and probably leaves a lot of redundant tags in the file, but it works. 

An automated process based on this would be a fair solution.
Comment 27 Bernard Moreton 2018-02-23 17:21:34 UTC
This is a problem for me still in 6.0.0.1, first observed in the last 5.4, but only in one particular document, and only in the PDF export, where the font NanumMyeongjo is embedded.  This is not used in the document, does not appear in an RTF export, and I cannot see it when unzipping the ODT and inspecting content and styles.  It would be good if one of the suggested fixes were enabled.
Comment 28 Bernard Moreton 2018-02-24 12:30:10 UTC
I found the source of my Nanum font - in a badly-imported umlauted character in a footnote on p.108. Writer showed it as in Liberation Serif, but changing the character to remove the umlaut removed the unwanted NanumMyeongjo font from the PDF-1a export
Comment 29 Dmitry 2018-03-21 20:10:12 UTC
Created attachment 140800 [details]
empty document, 17MB

The problem is present in LibreOffice Writer 5.4.6.2 too. (Build ID: 1:5.4.6~rc2-0ubuntu0.16.04.1)
Comment 30 Robert 2018-06-19 23:02:28 UTC
The problem is still present in LibreOffice Writer 5.4.7.2 (x64).
Comment 31 Commit Notification 2018-07-10 14:16:19 UTC
Tomaž Vajngerl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=eb6ff07605a55675e7007ac0cb5604fb13a9ddf9

[API CHANGE] tdf#65353 Add more doc. settings more embedding fonts

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 32 Commit Notification 2018-07-10 14:16:32 UTC
Tomaž Vajngerl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3bc3ddc11fd94877d9c5d2b8313ab53150818236

tdf#65353 filter fonts when embedding (unused, font script)

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 33 Commit Notification 2018-07-11 06:27:26 UTC
Tomaž Vajngerl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=39e5b2c174c6a27b5c3e2a08b00dd4c26677e07f

tdf#65353 test for font embedding in ODF documents

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 34 Commit Notification 2018-07-12 19:09:14 UTC
Tomaž Vajngerl committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=1a8435a23e84f3ceeee580eb9d4404a738d98888

tdf#65353 separate autostyle collection and export

It will be available in 6.2.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 35 Xisco Faulí 2018-08-15 14:40:55 UTC
A polite ping to Tomaž Vajngerl:
Is this bug fixed? if so, could you please close it as RESOLVED FIXED ?
Otherwise, Could you please explain what's missing?

Thanks
Comment 36 martin_hosken 2018-08-21 02:27:40 UTC
Hmm. I tried a simple, one font document of some Awami text and said just to only include fonts used in documents, and the resulting .odt contained: "Courier New" (4 faces), Liberation Mono (4 faces), Liberation Serif (4 faces), Tahoma (1 face) in addition to the intended Awami (1 face).

It's tricky because the character formatting for that one line of text says: Western Font = Awami Nastaliq, Asian = Courier New and CTL = Awami Nastaliq Beta 3 (which is referenced by the .odt but not included, perhaps because the font has gone away). So perhaps the Courier New inclusion makes sense. Not so sure about the Liberation and Tahoma, though. It does look like these fonts are referenced by the default styles.

The question is whether fonts for unused styles should be included. I would argue that they not be included. Even a basic document of simply saying: new document, then select a font and type some text, ends up with quite a bit of cruft. Yes this is a distinct improvement over including all the system fonts. But it's still quite a big file.
Comment 37 Michael Meeks 2018-08-21 08:52:46 UTC
Ah - is that specific to the fonts used by the default style I wonder ? I wonder if that style is used in some way that is not entirely obvious (?) any thoughts Quikee?