I’ll be submitting this problem report to both OpenOffice and LibreOffice.
To perform these steps, you will need to have Tahoma and Lucida Sans Unicode fonts (common) and EITHER Symbola font OR both FreeSerif font and Segoe UI Symbol font. (While the latter are not as common, they are available for free).
1. The six test lines below all begin with a Unicode character followed by text which describes the character. Copy the six lines into a WordPad document and change the font for the lines to Tahoma. Note that only the Unicode character in the second line (sharp sign) is displayed.
2. Next change the font of all six lines to Lucida Sans Unicode. Note that now the first two Unicode characters (flat and sharp) are displayed.
3. Now -either- change the last four lines to Symbola -or- change the middle two lines to FreeSerif and the last two lines to Segoe UI Symbol. Note that now all the Unicode characters are displayed correctly.
The first three steps have demonstrated the correct behavior. I would expect the same behavior in OpenOffice Writer and LibreOffice Writer.
4. But now repeat the same three steps in either version of “Writer.” Note that the Unicode characters in the last four lines are never displayed correctly.
5. Now change the last four lines to OpenSymbol font. Note that this still doesn’t help...
It seems that this happens with any 5-hex-digit Unicode character which is supported by a font like Symbola, FreeSerif or Segoe Symbol UI in WordPad. (Related LibreOffice Bug 71603 appears to be just one instance of this general problem with 5-hex-digit Unicode characters.)
I haven’t tried it in MS Word, but I’d assume that if it works in WordPad, it will work in Word.
_ _ _ _ _
the six test lines:
♭ 266D music flat sign
♯ 266F music sharp sign
𝄋 1D10B segno
𝄌 1D10C coda
🎶 1F3B6 multiple musical notes
🎷 1F3B7 saxophone
I’ll switch to using and recommending whichever “Office” is either [a] quickest in showing me how to display 5-hex-digit Unicode characters in the Writer application the way it is -or- [b] quickest in fixing the problem.
I tested this issue under Linux (the operating system I use).
So I copied the test lines from this bug report, and the pasting mechanism threw away the special characters (except ♭ and ♯). But then I tried pasting the test lines by using the “Text Without Formatting” option (from the Paste Special dialog, Ctrl+Shift+V) and all of the special characters were pasted correctly.
Please let me know if using Paste Special > Text Without Formatting works for you under Windows. I’m adjusting this bug’s title a bit.
(In reply to comment #1)
> I tested this issue under Linux (the operating system I use).
> So I copied the test lines from this bug report, and the pasting mechanism
> threw away the special characters (except ♭ and ♯). But then I tried pasting
> the test lines by using the “Text Without Formatting” option (from the Paste
> Special dialog, Ctrl+Shift+V) and all of the special characters were pasted
> Please let me know if using Paste Special > Text Without Formatting works
> for you under Windows. I’m adjusting this bug’s title a bit.
Regular paste gave the same 2 characters you mentioned appeared on linux, but paste special only gives 'HTML format' and 'HTML format without comments'. Selecting without comments had the first entry as a blank box sometimes and sometimes as the b, the second entry showed correctly always, while the remaining 4 showed as questions marks. This was on 4.2.4 and 4.3.0 on Windows 7.
Changing title again since Paste Special does not work. Depending on the font, it can look as though the characters were dropped in the paste when they weren't. It's just that they aren't displayed. This might even be correct behavior, depending on the font, so it's important to mention the font you're attempting to have the characters rendered in.
WordPad correctly renders all four of the 5-hex-digit Unicode characters (lines 3 - 6) in Symbola font. It also correctly renders lines 3 and 4 in FreeSerif -- and lines 5 and 6 in Segoe UI Symbol. Both LibreOffice Writer and OpenOffice Writer should be able to do as well. And they should be exportable to PDF.
Since submitting this, I've discovered that KingSoft Writer does render these characters correctly, but they are lost when KingSoft tries to export them to PDF. (But KingSoft also has some other bugs related to Unicode characters that LibreOffice/OpenOffice doesn't have).
The symbols do appear after application restart and reloading the document.
Reproduced on OSX / LO 184.108.40.206 and 4.4 master:
"Paste Special" as "Unformatted text" does the right thing, and all characters are displayed.
Regular paste and paste as HTML appear to replace all the characters outside the Unicode basic multilingual plane (i.e. >0xFFFF) with "?" (a literal question mark, not a placeholder for a non-rendered character). Given that, whatever the font is then changed to makes no difference.
-> Platform: All
*** Bug 85315 has been marked as a duplicate of this bug. ***
On Windows 10 Pro 64-bit en-US with
Build ID: 917d59a84124d1022bd1912874e7a53c674784f1
CPU Threads: 8; OS Version: Windows 6.2; UI Render: GL;
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-12-12_12:17:04
Locale: en-US (en_US)
Confirming observations of comment 5, i.e. that Edit -> Paste special: Unformatted text handles the 5-hex-digit characters correctly. Also that regular Paste, or Paste Special: HTML is corrupting the pasted text and losing character.
Adjusting font with a combination of Bravua Text and Segoe Symbol UI correctly show all glyphs on Paste Special: unformatted.
*** Bug 85316 has been marked as a duplicate of this bug. ***
Mark Hung committed a patch related to this issue.
It has been pushed to "master":
tdf#81129 Support reading non-BMP characters in HTML documents.
It will be available in 5.2.0.
The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
Affected users are encouraged to test the fix and report feedback.