Bug 96416 - Symbol for UTF-8 FFF9 character (Interlinear Annotation Anchor) not visible on Linux
Summary: Symbol for UTF-8 FFF9 character (Interlinear Annotation Anchor) not visible o...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
5.0.3.2 release
Hardware: x86-64 (AMD64) Linux (All)
: low minor
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-12-11 15:43 UTC by Luke Kendall
Modified: 2021-12-03 07:41 UTC (History)
1 user (show)

See Also:
Crash report or crash signature:


Attachments
Zio file of sample .odt file and .docx produced from it, showing the bug (19.78 KB, application/zip)
2015-12-11 15:43 UTC, Luke Kendall
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Luke Kendall 2015-12-11 15:43:53 UTC
Created attachment 121226 [details]
Zio file of sample .odt file and .docx produced from it, showing the bug

If you use pairs of characters <non-breaking space><space> between sentences, for a small percentage of occasions, LO will insert a UTF-8 character FFF9 at the start or the end of the pair of space characters.  I think my MS has something like 20,000 sentences, and LO generated 233 of these odd characters, that show up in the Kindle previewer as illegal characters.

I'll attach an example .odt and .docx produced by Save As from LO.

You no doubt no that you can search for such chars in vim via /\%ufff9
Comment 1 Buovjaga 2015-12-15 08:24:39 UTC
I see it before "Old bones".
It is displayed as IAA inside a dashed box.
According to this IAA is short for Interlinear Annotation Anchor: http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=%EF%BF%B9&mode=char

I could not reproduce the appearing of the extra anchor character. Not in the attached .odt or from scratch. Tried a dozen times in each.

Win 7 Pro 64-bit, Version: 5.0.3.2 (x64)
Build ID: e5f16313668ac592c1bfb310f4390624e3dbfb75
Locale: fi-FI (fi_FI)

Version: 5.2.0.0.alpha0+
Build ID: 917d59a84124d1022bd1912874e7a53c674784f1
CPU Threads: 4; OS Version: Windows 6.1; UI Render: default; 
TinderBox: Win-x86@62-merge-TDF, Branch:MASTER, Time: 2015-12-12_12:17:04
Locale: fi-FI (fi_FI)
Comment 2 Luke Kendall 2015-12-15 08:52:23 UTC
Interesting that you couldn't reproduce it on Windows.

I've done some more investigation, too.

The UTF-8 character is in fact in the original .odt file!  I don't know how it got in there, though.  Do you have any idea how one might accidentally type it from the keyboard?

But doesn't that make it especially strange that you couldn't reproduce the problem in the .docx conversion under Windows?

I think the safest assumption is that it's possible to somehow generate the character accidentally.  I suspect the way to generate it may be similar somehow to the Ctrl-Shift-<space> needed to generate a non breaking space.  I suspect I've occasionally mis-keyed that sequence to produce the bad character.

Is there any way in LO to find or report on such unusual characters?  Or, better still, to make them visible (ideally, as some kind of "illegal character" box)?

As thing stand, to find such erroneous characters in a novel produced using LO, you need to manually inspect the result in the Kindle previewer page by page, looking for illegal character indicators.  That means the only way I know to find them in LO (if you don't know what may be there) is to arrow past them and notice that the cursor doesn't move.  That's a completely impractical mechanism, though.

May I suggest this bug, though, is changed from severity "Normal" to "enhancement"?  I think it's fundamental cause is user error.
Comment 3 Buovjaga 2015-12-15 10:01:58 UTC
(In reply to Luke Kendall from comment #2)
> Is there any way in LO to find or report on such unusual characters?  Or,
> better still, to make them visible (ideally, as some kind of "illegal
> character" box)?

Ah, now I tested on Ubuntu and indeed the IAA box is not visible.
Adjusting the summary.

To answer your other q: no, I did not reproduce the appearance of more IAA chars when saving to docx after having inserted nbsp spaces + spaces.

Ubuntu 15.10 64-bit 
Version: 5.0.3.2
Build ID: 1:5.0.3~rc2-0ubuntu1
Locale: en-US (en_US.UTF-8)

Version: 5.2.0.0.alpha0+
Build ID: 014633f83e44ae8ba33087b6f38e8e253e281969
CPU Threads: 2; OS Version: Linux 4.2; UI Render: default; 
TinderBox: Linux-rpm_deb-x86_64@70-TDF-dbg, Branch:master, Time: 2015-12-15_06:21:19
Locale: en-US (en_US.UTF-8)
Comment 4 QA Administrators 2017-01-03 19:47:32 UTC Comment hidden (obsolete)
Comment 5 QA Administrators 2019-12-03 14:52:18 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2021-12-03 04:42:13 UTC Comment hidden (obsolete)
Comment 7 Luke Kendall 2021-12-03 07:41:20 UTC
They're visible in the Linux version of LO now, thanks.

Version: 7.2.3.2 / LibreOffice Community
Build ID: d166454616c1632304285822f9c83ce2e660fd92
CPU threads: 8; OS: Linux 5.13; UI render: default; VCL: gtk3
Locale: en-AU (en_AU.UTF-8); UI: en-US
Calc: threaded