Bug 126111 - FILEOPEN “Square” symbols replaced
Summary: FILEOPEN “Square” symbols replaced
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
(earliest affected)
Hardware: All All
: medium normal
Assignee: Not Assigned
Keywords: bibisected, bisected
: 146122 (view as bug list)
Depends on:
Blocks: Font-Rendering
  Show dependency treegraph
Reported: 2019-06-26 12:59 UTC by NISZ LibreOffice Team
Modified: 2023-05-18 15:12 UTC (History)
8 users (show)

See Also:
Crash report or crash signature:

Screenshot of the original document side by side in Word and Writer (57.90 KB, image/png)
2019-06-26 13:01 UTC, NISZ LibreOffice Team
Example file from Word (26.35 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-06-26 13:01 UTC, NISZ LibreOffice Team
Example document with replacement of invalid PUA codepoints with suitable Unicode gllyphs (26.85 KB, application/vnd.openxmlformats-officedocument.wordprocessingml.document)
2019-06-26 20:39 UTC, V Stuart Foote

Note You need to log in before you can comment on or make changes to this bug.
Description NISZ LibreOffice Team 2019-06-26 12:59:44 UTC
Attached user document used some square characters (code points without real character symbols assigned to them) as checkable boxes for offline filling.
These were imported correctly before 5.3, now they are replaced with wastly differently looking characters, from a different charset.

Steps to Reproduce:
1.	Open attached document in Word and Writer

Actual Results:
“Square” characters are replaced with signs of 1/2PI and Windows-logo.

Expected Results:
“Squares” look the same.

Reproducible: Always

User Profile Reset: No

Additional Info:
LibreOffice details:
Version: (x86)
Build ID: 4808ae1c73597726c89936f5b9cb3f11c9a4a7bf
CPU threads: 4; OS: Windows 6.3; UI render: GL; VCL: win; 
TinderBox: Win-x86@42, Branch:master, Time: 2019-06-24_23:50:55
Locale: hu-HU (hu_HU); UI-Language: en-US
Calc: CL
Comment 1 NISZ LibreOffice Team 2019-06-26 13:01:27 UTC
Created attachment 152416 [details]
Screenshot of the original document side by side in Word and Writer
Comment 2 NISZ LibreOffice Team 2019-06-26 13:01:45 UTC
Created attachment 152417 [details]
Example file from Word
Comment 3 V Stuart Foote 2019-06-26 14:09:44 UTC
Sorry, that is clearly NAB. You have used PUA code points that are not defined in the font you have applied to the text run, and then "expect" them to be rendered in some consistent fashion when the document is opened elsewhere?

Your screen shot shows the first set of U+F07F in MS Reference Specialty font, the second set shows U+F0FF in a font I can't identify. Point is the document is in error--not the font handling.

If you need one of the Checkbox glyphs.
Comment 4 Jacques Guilleron 2019-06-26 14:18:36 UTC
Hi NISZ LibreOffice Team,

I don't reproduce in LO from
LO Build ID: 72fee18f394a980128dc111963f2eefb05998eeb
Threads CPU : 2; Version de l'OS :Windows 6.1; UI Render : par défaut; Moteur de mise en page : nouveau; Locale : fr-FR (fr_FR); Calc: CL
LO (x86) Build ID: 719f4a93e46a6b397356dbb605d2867639ca3942
CPU threads: 2; OS: Windows 6.1; UI render: default; VCL: win; 
Locale: fr-FR (fr_FR); UI-Language: en-US Calc: CL
Squares are not replaced for my part.

Comment 5 V Stuart Foote 2019-06-26 14:36:29 UTC
(In reply to V Stuart Foote from comment #3)
> If you need one of the Checkbox glyphs.

U+2610 -- ☐
U+2752 -- ❒ (my personal preference for this use)

U+25A1 -- □
U+25FB -- ◻
U+1F790 -- 🞐 (spotty coverage of glyphs in the SMP)
U+2751 -- ❑

This is _NOT_ a bug!
Comment 6 V Stuart Foote 2019-06-26 14:42:46 UTC
(In reply to Jacques Guilleron from comment #4)
> Squares are not replaced for my part.

The point is those are not "squares" in any font. Rather, that is a place holder for an Unicode point with undefined glyph in the font assigned to the text run. By definition it requires font fallback handling, which LibreOffice correctly applied. It would actually not be correct to show the "undefined" placeholder.

But the result is always going to be arbitrary (dependent on os and fonts present on system).
Comment 7 Aron Budea 2019-06-26 17:00:34 UTC
Clearly something can be NOTABUG from LibreOffice perspective, and an interoperability issue at the same time, it would be worth considering user experience and thinking about options.
Comment 8 V Stuart Foote 2019-06-26 20:39:09 UTC
Created attachment 152433 [details]
Example document with replacement of invalid PUA codepoints with suitable Unicode gllyphs
Comment 9 Gabor Kelemen (allotropia) 2019-06-28 12:54:13 UTC
What was omitted from the report is that this document used to "work" before 5.3, in particular:

URL: https://cgit.freedesktop.org/libreoffice/core/commit/?id=5d39c2013374727b1c8f147b8b99d54402a7ff02 

author	Khaled Hosny <khaledhosny@eglug.org>	2016-11-02 01:37:21 +0200
committer	Khaled Hosny <khaledhosny@eglug.org>	2016-11-02 01:37:21 +0200
summary: tdf#71603: Create a new DC for the font fallback

Maybe that was an anomaly, I don't know.

Also we are aware that this is not the correct way to create empty spaces for an offline form, yet the world is full of such terribly crafted documents and users are happy that such documents are working at all - at least in Word.

For them this situation is not understood as 'I have no idea what I'm doing' but 'I have to redo my document because LibreOffice sucks'.

So: can we be a little more emphatic and constructive here?
Comment 10 Jan-Marek Glogowski 2019-07-06 12:22:12 UTC
Does MS Word support fallback glyph handling at all? Do we need a compatibility option for that? Or is there already one?

LO will also display "the square", if there is no font available, which contains a glyph for a requested unicode code point. But at this point you have to guarantee equal installed fonts for LO and MS Office everywhere.

Normally LO comes with a bunch of compatibility fonts (100+ for me) to exactly prevent these squares. I don't know if these are installed system wide or not by the Windows installer, but for a normal build they are private to LO, so MS Word won't see them. So - you can remove the LO_HOME/share/fonts directory (but I suggest you keep the opensymbol one) and you will probably be back to your expected state.

Obviously that'll only work where you can guarantee this kind of font setup.
Comment 11 Khaled Hosny 2019-07-06 20:47:03 UTC
We probably should prevent font fallback for PUA code points, since by definition these are font-specific and if the main font does not have them any glyph in fallback fonts is probably meaningless in this context.
Comment 12 Khaled Hosny 2019-07-06 21:51:27 UTC
(not saying the OP issue is valid, but not doing fallback for PUA is probably what MS Office does and is generally a good idea).
Comment 13 V Stuart Foote 2019-07-08 03:33:13 UTC
(In reply to Khaled Hosny from comment #11)
> We probably should prevent font fallback for PUA code points, since by
> definition these are font-specific and if the main font does not have them
> any glyph in fallback fonts is probably meaningless in this context.

Saw the note in https://gerrit.libreoffice.org/#/c/75187/ regards bug 33898 and use of EUDC -- End User Defined Characters -- manipulated directly by users with the 'eudcedit.exe' app.  The font editor defaults to Unicode PUA, but looks to also support locale code pages. Fonts are written to user profile %APPDATA%/local/Microsoft/Windows/EUDC as EUDC.TTE and .EUF

Common practice seems to be to extend a font with glyphs for use in Unicode, or multi-byte CJK fonts--and then those extended fonts get passed around. What seemed interesting was that a flag could be set to include those additional EUDC glyphs in all registered fonts.  So seems being able to access them in LO font handling could remain important for users dependent on the practice. 

Details are in this Microsoft provided documentation:
Comment 14 Khaled Hosny 2019-07-08 05:57:33 UTC
(In reply to V Stuart Foote from comment #13)
> (In reply to Khaled Hosny from comment #11)
> > We probably should prevent font fallback for PUA code points, since by
> > definition these are font-specific and if the main font does not have them
> > any glyph in fallback fonts is probably meaningless in this context.
> Saw the note in https://gerrit.libreoffice.org/#/c/75187/ regards bug 33898
> and use of EUDC -- End User Defined Characters -- manipulated directly by
> users with the 'eudcedit.exe' app.  The font editor defaults to Unicode PUA,
> but looks to also support locale code pages. Fonts are written to user
> profile %APPDATA%/local/Microsoft/Windows/EUDC as EUDC.TTE and .EUF
> Common practice seems to be to extend a font with glyphs for use in Unicode,
> or multi-byte CJK fonts--and then those extended fonts get passed around.
> What seemed interesting was that a flag could be set to include those
> additional EUDC glyphs in all registered fonts.  So seems being able to
> access them in LO font handling could remain important for users dependent
> on the practice. 
> Details are in this Microsoft provided documentation:
> https://docs.microsoft.com/en-us/windows/win32/intl/end-user-defined-
> characters

Since we don't actually support (In reply to V Stuart Foote from comment #13)
> (In reply to Khaled Hosny from comment #11)
> > We probably should prevent font fallback for PUA code points, since by
> > definition these are font-specific and if the main font does not have them
> > any glyph in fallback fonts is probably meaningless in this context.
> Saw the note in https://gerrit.libreoffice.org/#/c/75187/ regards bug 33898
> and use of EUDC -- End User Defined Characters -- manipulated directly by
> users with the 'eudcedit.exe' app.  The font editor defaults to Unicode PUA,
> but looks to also support locale code pages. Fonts are written to user
> profile %APPDATA%/local/Microsoft/Windows/EUDC as EUDC.TTE and .EUF
> Common practice seems to be to extend a font with glyphs for use in Unicode,
> or multi-byte CJK fonts--and then those extended fonts get passed around.
> What seemed interesting was that a flag could be set to include those
> additional EUDC glyphs in all registered fonts.  So seems being able to
> access them in LO font handling could remain important for users dependent
> on the practice. 
> Details are in this Microsoft provided documentation:
> https://docs.microsoft.com/en-us/windows/win32/intl/end-user-defined-
> characters

OK, so this is not just a font named EUDC, it is more configurable and what we had in LibreOffice didn't work with this anyway and was actually different than how other Windows applications handle this feature, so I'm not very sure how many users are actually aware of this LibreOffice-specific special handling of fonts named EUDC that does not match how the system handles EUDC fonts.
Comment 15 QA Administrators 2021-07-09 04:02:22 UTC Comment hidden (obsolete, spam)
Comment 16 Justin L 2022-09-05 22:35:20 UTC
I don't think this has anything to do specifically with MS formats, so I'm removing that from the subject line.
Comment 17 Justin L 2023-05-18 12:33:48 UTC
*** Bug 146122 has been marked as a duplicate of this bug. ***