Bug 94753 - Supplementary character change via KMfL inserts square boxes
Summary: Supplementary character change via KMfL inserts square boxes
Status: RESOLVED INSUFFICIENTDATA
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Writer (show other bugs)
Version:
(earliest affected)
unspecified
Hardware: Other All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-04 12:11 UTC by Richard Wordingham
Modified: 2018-03-02 10:05 UTC (History)
2 users (show)

See Also:
Crash report or crash signature:


Attachments
My correction for the bug (file gtksalframe.cxx) (3.82 KB, patch)
2015-10-08 22:00 UTC, Richard Wordingham
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Wordingham 2015-10-04 12:11:48 UTC
This is a character input problem using Keyman for Linux via ibus on Ubuntu.

Ubuntu base build: 12.04 (precise)
ibus: 1.4.1-3ubuntu
ibus-kmfl: 1.0.8-2+precise
libkmflcomp0: 0.9.10-2+precise
LibreOffice: 5.0.2.2

The problem occurs in LibreOffice writer.  It does not occur in LibreOffice calc, gnome-terminal or Claws.

The relevant pair of lines from the keyboard definition file (tirhuta.kmn) are:

            + [shift K_Y] > U+114C0 c TIRHUTA SIGN ANUSVARA
U+114C0     + [shift K_H] > U+114BF c TIRHUTA SIGN CANDRABINDU

On entering shift/Y, anusvara is displayed as expected.  On then entering shift/H, an oblong box appears with candrabindu displayed above it.  What should happen is that the anusvara is replaced by candrabindu.  Copying and pasting the resulting undisplayable text(?) to Firefox may result in question marks, whereas a fallback font is used for unrecognised characters.

An extreme work-around is to save the file, close it and reopen the display.  The text then displays as intended!  An easier work around is to move the cursor backwards and delete the oblong box.  I fear only works because, for some reason, U+114BF is not recognised as a combining character.

I suspect the problem is that in response to shift/H, ibus sends BS, U+114BF, and in response to BS, Libreoffice deletes one UTF-16 code unit rather than a complete Unicode character, which in this case is two UTF-16 code units long.  This would leave an unpaired surrogate in the backing store.
Comment 1 Richard Wordingham 2015-10-05 19:08:59 UTC
The bug is also present in Version: 4.4.4.3.0+
Build ID: ec4b4a26b51419ebb60ebb910c2c5b029bd88bd0.
Comment 2 Richard Wordingham 2015-10-08 22:00:53 UTC
Created attachment 119444 [details]
My correction for the bug (file gtksalframe.cxx)

My correction is made to Version: 4.4.4.3.0+ in what is recorded as
'Build ID: ec4b4a26b51419ebb60ebb910c2c5b029bd88bd0'

There are a number of stylistic problems with my patch that I do not feel able to resolve:

1) I have copied 4 defines and 2 functions from sal/rtl/surrogates.hxx as I could not see a clean way to share them.

2) I have added a static function advpos().  In terms of code, it shares a great deal with OUString::iterateCodePoints(), or rather the function it calls.  I can't see how to merge the code in a clean way.  In the thread 'Can't track flow of characters in from Input Method Editor', in a post dated 8 October 2015, Caolán McNamara suggested a way to use the latter function directly, at the risk of great inefficiency.

3) If one only requires the code to function according to a strict specification when the strings have not been corrupted by the inclusion of lone surrogates, advpos() can be further simplified.  Those options are commented out by means of '#if 0'.

Additionally:

4) The modified function GtkSalFrame::IMHandler::signalIMDeleteSurrounding() has only been tested for inputs offset = -1, nchars = +1.
Comment 3 Stephan Bergmann 2015-10-09 08:05:53 UTC
(In reply to Richard Wordingham from comment #2)
> Created attachment 119444 [details]
> My correction for the bug (file gtksalframe.cxx)
> 
> My correction is made to Version: 4.4.4.3.0+ in what is recorded as
> 'Build ID: ec4b4a26b51419ebb60ebb910c2c5b029bd88bd0'

please try to use LO's gerrit to discuss patch proposals (or, at the very least, include full file paths into the LO source tree in your patches)

> 1) I have copied 4 defines and 2 functions from sal/rtl/surrogates.hxx as I
> could not see a clean way to share them.

Note that isHigh/LowSurrogate have been moved to rtl/characters.hxx (which you can include from vcl code) lately (though not on the 4.4 branch).
Comment 4 Xisco Faulí 2017-07-12 10:02:10 UTC
Hello Richard,
Are you still working on this?
If so, please submit your patch to gerrit and it will be reviewed by other developers -> https://wiki.documentfoundation.org/Development/gerrit/SubmitPatch
Comment 5 QA Administrators 2018-01-29 10:27:39 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2018-03-02 10:05:52 UTC
Dear Bug Submitter,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INSUFFICIENTDATA due to inactivity and
a lack of information which is needed in order to accurately
reproduce and confirm the problem. We encourage you to retest
your bug against the latest release. If the issue is still
present in the latest stable release, we need the following
information (please ignore any that you've already provided):

a) Provide details of your system including your operating
   system and the latest version of LibreOffice that you have
   confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED
and we will attempt to reproduce the issue. Please do not:

a) respond via email 

b) update the version field in the bug or any of the other details
   on the top section of our bug tracker

Warm Regards,
QA Team

MassPing-NeedInfo-20180302