Bug 146173 - Non-BMP Unicode characters are imported wrong from UTF-16-encoded HTML
Summary: Non-BMP Unicode characters are imported wrong from UTF-16-encoded HTML
Status: RESOLVED FIXED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: All All
: medium normal
Assignee: Mike Kaganski
URL:
Whiteboard: target:7.4.0 target:7.3.0.0.beta2
Keywords:
Depends on:
Blocks:
 
Reported: 2021-12-11 08:39 UTC by Mike Kaganski
Modified: 2021-12-13 14:59 UTC (History)
0 users

See Also:
Crash report or crash signature:


Attachments
An UTF-16BE-with-BOM-encoded HTML, featuring non-BMP emojis (142 bytes, text/html)
2021-12-11 08:39 UTC, Mike Kaganski
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2021-12-11 08:39:47 UTC
Created attachment 176860 [details]
An UTF-16BE-with-BOM-encoded HTML, featuring non-BMP emojis

Importing the attached HTML to Writer, the result shows question marks for emojis. Importing it as plain text shows them correctly.
Comment 1 Mike Kaganski 2021-12-11 09:55:32 UTC
https://gerrit.libreoffice.org/c/core/+/126658
Comment 2 Commit Notification 2021-12-11 11:22:42 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/core/commit/21154ea8c450f9f5568b32123d34a20e498a9290

tdf#146173: combine non-BMP characters' surrogates correctly

It will be available in 7.4.0.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.
Comment 3 Commit Notification 2021-12-13 14:59:52 UTC
Mike Kaganski committed a patch related to this issue.
It has been pushed to "libreoffice-7-3":

https://git.libreoffice.org/core/commit/409a0e4ed268c06af924696dbdc29a7edd09df41

tdf#146173: combine non-BMP characters' surrogates correctly

It will be available in 7.3.0.0.beta2.

The patch should be included in the daily builds available at
https://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
https://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.