Bug 103944 - FILEOPEN: Bullets imported as open squares in .doc files
Summary: FILEOPEN: Bullets imported as open squares in .doc files
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: graphics stack (show other bugs)
Version:
(earliest affected)
5.3.0.0.alpha1+
Hardware: All Linux (All)
: medium normal
Assignee: Khaled Hosny
URL:
Whiteboard: target:5.3.0
Keywords: bibisected, bisected, regression
Depends on:
Blocks: HarfBuzz-regressions
  Show dependency treegraph
 
Reported: 2016-11-15 19:58 UTC by Luke
Modified: 2017-05-30 06:06 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments
Example .doc file with bullets (26.00 KB, application/msword)
2016-11-15 19:58 UTC, Luke
Details
Screenshot, no boxes (325.15 KB, image/png)
2016-11-15 23:22 UTC, Khaled Hosny
Details
Screenshot from Arch Linux (172.08 KB, image/png)
2016-11-16 04:01 UTC, Luke
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke 2016-11-15 19:58:57 UTC
Created attachment 128770 [details]
Example .doc file with bullets

Steps to Reproduce:
1. Open a MS .doc file with bullets in a recent build of LO

Expected results:
Bullets

Actual Results:
Square symbol


Build ID: 3287bc2f91438085b7604773d5e0346fc3c3f452 - GOOD
Build ID: c67b55db4adad67a8584b00f88b7ed296ba15846 - BAD
Comment 1 Xisco Faulí 2016-11-15 20:46:10 UTC
Confirmed in

Version: 5.3.0.0.alpha1+
Build ID: 757a60d01dd152aadab2ba3c8224252481ce8a88
CPU Threads: 4; OS Version: Linux 4.8; UI Render: default; VCL: gtk3; Layout Engine: new; 
Locale: ca-ES (ca_ES.UTF-8); Calc: group

This seems to be related to the new layout engine. Introduced by 8f2dd1df1d6cc94ebbc1149de72bc6d6dffa6533

Adding Cc: to Khaled Hosny
Comment 2 V Stuart Foote 2016-11-15 22:09:43 UTC
Can not reproduce. 

Do not get "square" bullet (i.e. bad font fall back to OpenSymbol) with any mix of new Harfbuzz layout or old DirectWrite/GDI+ layout with OpenGL or default rendering.

Version: 5.3.0.0.alpha1+ (x64)
Build ID: c5f5b3e5334c52502c1de28828a44ad469c68850
CPU Threads: 8; OS Version: Windows 6.29; UI Render: GL; Layout Engine: new; 
TinderBox: Win-x86_64@62-TDF, Branch:MASTER, Time: 2016-11-14_06:28:54
Locale: en-US (en_US); Calc: CL

Version: 5.3.0.0.alpha1+
Build ID: bb50b1609abe83265311613db4a18e992dc666c8
CPU Threads: 8; OS Version: Windows 6.2; UI Render: GL; Layout Engine: new; 
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2016-11-14_23:25:25
Locale: en-US (en_US); Calc: CL

Bullets offered in the Bullets and Numbering dialog show their OpenSymbol glyphs.
Comment 3 Luke 2016-11-15 22:18:22 UTC
Under Ubuntu 16.04 
Version: 5.3.0.0.alpha1+
Build ID: c67b55db4adad67a8584b00f88b7ed296ba15846
CPU Threads: 4; OS Version: Linux 4.4; UI Render: default; VCL: gtk3; Layout Engine: old; 
Locale: en-US (en_US.UTF-8); Calc: group

I can reproduce the issue. 

With: $ SAL_NO_COMMON_LAYOUT=1 soffice.exe

I cannot reproduce the issue.

Seems like a Linux specific bug.
Comment 4 Luke 2016-11-15 22:27:40 UTC
Sorry pasted the 'About' in the wrong section.

Build ID: c67b55db4adad67a8584b00f88b7ed296ba15846
CPU Threads: 4; OS Version: Linux 4.4; UI Render: default; VCL: gtk3; Layout Engine: old; Locale: en-US (en_US.UTF-8); Calc: group - GOOD


Build ID: c67b55db4adad67a8584b00f88b7ed296ba15846
CPU Threads: 4; OS Version: Linux 4.4; UI Render: default; VCL: gtk3; Layout Engine: new; Locale: en-US (en_US.UTF-8); Calc: group - BAD
Comment 5 Khaled Hosny 2016-11-15 23:22:15 UTC
Created attachment 128773 [details]
Screenshot, no boxes

I can’t reproduce this either. Do you have a font named “Symbol”? e.g when you try to edit the bullet, the font “Symbol” is found or marked as fallback?
Comment 6 Luke 2016-11-16 01:54:29 UTC
In both the new and old layout, the font “Symbol” is displayed as the font type. 

As far as what's installed, this is what I found

luke@luke-Inspiron-5523:/usr/share/fonts$ find . -name "symbol*"
luke@luke-Inspiron-5523:/usr/share/fonts$ find . -name "Symbol*"
./truetype/ancient-scripts/Symbola_hint.ttf

Only Symbola
Comment 7 Khaled Hosny 2016-11-16 02:16:44 UTC
(In reply to Luke from comment #6)
> In both the new and old layout, the font “Symbol” is displayed as the font
> type.

“font type”?


> 
> As far as what's installed, this is what I found
> 
> luke@luke-Inspiron-5523:/usr/share/fonts$ find . -name "symbol*"
> luke@luke-Inspiron-5523:/usr/share/fonts$ find . -name "Symbol*"
> ./truetype/ancient-scripts/Symbola_hint.ttf
> 
> Only Symbola

What do you get from:
$ fc-match "Symbol"
Comment 8 Luke 2016-11-16 04:01:45 UTC
Created attachment 128778 [details]
Screenshot from Arch Linux

> “font type”?
The drop-down box in the toolbar says "Symbol". In the Character->Font dialog I see
Western Text Font: 'Symbol' and on the line below
"This font has not been installed. The closest available font will be used.

It say this for both the new and old layout engines.

> $ fc-match "Symbol"

In Ubuntu:
$ fc-match "Symbol"
s050000l.pfb: "Standard Symbols L" "Regular"

In Arch:
$ fc-match "Symbol"
DejaVuSans.ttf: "DejaVu Sans" "Book"
Comment 9 Khaled Hosny 2016-11-16 04:30:02 UTC
(In reply to Luke from comment #8)
> Created attachment 128778 [details]
> Screenshot from Arch Linux

OK, it turns out the bullet uses a Private Use Area character, and I happened to have a font that provides a glyph for that exact character that looks like a dash. After removing that font, I get boxes for *both* old a new engines.

> It say this for both the new and old layout engines.
> 
> > $ fc-match "Symbol"
> 
> In Ubuntu:
> $ fc-match "Symbol"
> s050000l.pfb: "Standard Symbols L" "Regular"

This is a Type 1 font, the new engine does not support these.

> In Arch:
> $ fc-match "Symbol"
> DejaVuSans.ttf: "DejaVu Sans" "Book"

This should work, but I doubt it actually has the character. What about:
$ fc-match "Symbol:charset=F0B7"

(F0B7 is the bullet character).
Comment 10 Luke 2016-11-16 06:40:07 UTC
Arch, $ fc-match "Symbol:charset=F0B7"
DejaVuSans.ttf: "DejaVu Sans" "Book"

Ubuntu with square symbol, 
$ fc-match "Symbol:charset=F0B7"
s050000l.pfb: "Standard Symbols L" "Regular"

Ubuntu with open box symbol,
$ fc-match "Symbol:charset=F0B7"
Webdings.ttf: "Webdings" "Regular"


We're going to have to reproduce the old behavior at either the OS or layout level. Bullets in MS Word docs are extremely common.
Comment 11 Khaled Hosny 2016-11-16 07:15:52 UTC
(In reply to Luke from comment #10)
> Arch, $ fc-match "Symbol:charset=F0B7"
> DejaVuSans.ttf: "DejaVu Sans" "Book"

So You don’t have a font that supports this character, since DejaVuSans.ttf certainly does not have it. Does it really work with the old layout engine on this machine? If so, please use:

$ fc-match -s "Symbol:charset=F0B7"

To get a sorted list of all fallback fonts, to see which one have the symbol.

> Ubuntu with square symbol, 
> $ fc-match "Symbol:charset=F0B7"
> s050000l.pfb: "Standard Symbols L" "Regular"

That is still a Type 1 which we do not support, if the system does not have any other font that has this symbol there is nothing we can do here.

> Ubuntu with open box symbol,
> $ fc-match "Symbol:charset=F0B7"
> Webdings.ttf: "Webdings" "Regular"

This one works or not? If I install Webdings.ttf I get a clapperboard symbol even with 5.2.
 
> We're going to have to reproduce the old behavior at either the OS or layout
> level. Bullets in MS Word docs are extremely common.

We need to first define what the old behaviour is, because right now I see no difference between 5.2, master old layout and master new layout. They all give me consistently the same result.
Comment 12 Khaled Hosny 2016-11-16 16:49:23 UTC
Here is my take away: Symbol font on Windows has a symbol cmap table that handles these PUA characters. On Linux “Symbol” font is usually mapped to “Standard Symbols L” by FontConfig, which is a Type 1 font and we no longer support those. “Standard Symbols L” comes from GhostSctipt fonts and the latest version actually (“Standard Symbols PS”), so it should work except that it lacks a symbol cmap and thus unusable here.

So this is not really a bug in the layout engine per se, but a lack of suitable font. Adding a symbol cmap to “Standard Symbols PS” might be the easiest fix, and we can either upstream this, bundle the modified font or both.
Comment 13 V Stuart Foote 2016-11-16 17:21:41 UTC
(In reply to Khaled Hosny from comment #12)
> Here is my take away: Symbol font on Windows has a symbol cmap table that
> handles these PUA characters. On Linux “Symbol” font is usually mapped to
> “Standard Symbols L” by FontConfig, which is a Type 1 font and we no longer
> support those. “Standard Symbols L” comes from GhostSctipt fonts and the
> latest version actually (“Standard Symbols PS”), so it should work except
> that it lacks a symbol cmap and thus unusable here.
> 
> So this is not really a bug in the layout engine per se, but a lack of
> suitable font. Adding a symbol cmap to “Standard Symbols PS” might be the
> easiest fix, and we can either upstream this, bundle the modified font or
> both.

Didn't Tamás take care of similar for bug 103664 with 

http://cgit.freedesktop.org/libreoffice/core/commit/?id=5ef66db91e87ef84724be22977acf4c9c472ad6b

And, isn't symbol kind of an important mapping as WMF/EMF use it internally? Should we have more precise control of any "important" PUA mappings and cast them to their Unicode equivilents?

https://bugs.documentfoundation.org/show_bug.cgi?id=103664#c10
Comment 14 Luke 2016-11-16 18:43:18 UTC Comment hidden (off-topic)
Comment 15 Mike Kaganski 2016-11-16 19:56:06 UTC Comment hidden (obsolete)
Comment 16 Luke 2016-11-16 22:32:31 UTC Comment hidden (off-topic)
Comment 17 Commit Notification 2016-11-18 07:44:53 UTC
Khaled Hosny committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=d8c386593e42e1f0cce52d052b1009c59e75afa2

tdf#103944: Fix symbol font remapping

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 18 Xisco Faulí 2016-11-18 09:25:20 UTC
other documents I found that can be used for testing the fix:

attachment 65149 [details]
attachment 60329 [details]
Comment 19 krishna [:kr1shna] 2017-05-30 06:06:26 UTC
verified.
version: 5.5.0.0.alpha0+ / build id : ec79f34 / android 5.1