54486 – debug build crashy (failed assertion) when typing text; icu 4.4.X, locale-specific ordinal autocorrect. Esp notable with fr_LU

Bug 54486 - debug build crashy (failed assertion) when typing text; icu 4.4.X, locale-specific ordinal autocorrect. Esp notable with fr_LU

Summary: debug build crashy (failed assertion) when typing text; icu 4.4.X, locale-spe...

Status:	VERIFIED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Writer (show other bugs)
Version: (earliest affected)	3.6.1.2 release
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	high critical
Assignee:	Caolán McNamara

URL:
Whiteboard:	target:3.7.0 target:3.6.2
Keywords:

Depends on:
Blocks:

Reported:	2012-09-04 08:55 UTC by Lionel Elie Mamane
Modified:	2012-09-11 15:50 UTC (History)
CC List:	1 user (show)

See Also:
Crash report or crash signature:

Attachments
backtrace (8.24 KB, text/plain) 2012-09-04 08:55 UTC, Lionel Elie Mamane	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Lionel Elie Mamane 2012-09-04 08:55:41 UTC

Created attachment 66601 [details]
backtrace

Just got several aborts on failed assertion in a row typing text using bullet lists in Writer (in an ODT file, not a .doc file), whose language is fr_LU. It seems to happen most often a bit like this:

1) type some text in a bullet point
2) do not save
3) type enter --> crash
4) if not crashed yet, type a few characters (especially a mix of letters and numbers: something like 12345XX, then a space) ---> crash

I attach the backtrace of one of these crashes, and here is some more info


(gdb) up 17
#17 0x00007fe016e28e4b in rtl::OUString::copy (this=this@entry=0x7fff094ad240, beginIndex=beginIndex@entry=5)
    at /home/master/src/libreoffice/workdirs/libreoffice-3.6/solver/unxlngx6/inc/rtl/ustring.hxx:1293
1293	        assert(beginIndex >= 0 && beginIndex <= getLength());
(gdb) print beginIndex
$1 = 5

(gdb) print this->pData->length
$3 = 4

(gdb) up
#18 0x00007fe016e2b307 in com::sun::star::i18n::OrdinalSuffix::getOrdinalSuffix (this=0x31a43b0, nNumber=42221, aLocale=...)
    at /home/master/src/libreoffice/workdirs/libreoffice-3.6/i18npool/source/ordinalsuffix/ordinalsuffix.cxx:102
102	                    retValue[ newLength - 1 ] = sValue.copy( len );
(gdb) print aLocale
$4 = (const com::sun::star::lang::Locale &) @0x20d7a60: {
  Language = "fr", 
  Country = "LU", 
  Variant = ""
}

(gdb) print sValue
$5 = "NaNe"
(gdb) print newLength
$6 = 1
(gdb) print len
$7 = 5
(gdb) print normalized
This sends GDB in a infinite (or very long) memory-consuming loop

From *another* crash (same backtrace, same failed abort, same sValue, same len, same newLength):


(gdb) print nCode 
$3 = U_ZERO_ERROR
(gdb) print i
$4 = 0
(gdb) print ruleSet
again infinite loop

Comment 1 Caolán McNamara 2012-09-05 08:57:38 UTC

So...
a) this is autocorrect to change e.g. "1st" to 1 + superscript st.
b) we use the number formatter thing in icu to add the suffix, use the sal one to find the bare number and subtract the two to get the suffix on its own.
c) we're not returning the values if they are the expected ones, which we should as that's the whole point of the exercise I thought
d) There's at least one bug there already in that "12345" is converted by icu to "12.345e" for fr for me while en remains as 12345th and we're not taking into account the addition of a , in finding the suffix
e) that said, I can't reproduce the crash on master by setting LANG=fr_LU.UTF8, the NaNe is presumably a "not a number" error so something in the stack has presumably gotten confused about something like , vs .

Are you using the default built-in icu or a system one ? And is locale is just set with LANG=fr_LU.UTF8 and letting LibO just take its defaults from that.

Comment 2 Caolán McNamara 2012-09-05 09:09:26 UTC

or because the ordinal suffix for French is "e" something reads that as exponent

Comment 3 Caolán McNamara 2012-09-05 11:49:15 UTC

Even though I can't reproduce this. I can see other non-fatal errors and committed http://cgit.freedesktop.org/libreoffice/core/commit/?id=a05357ab69712bec53c2d8d17efbbf25907ff9b8 to fix them. The side effect I bet is to now not crash on the (bizarre) "NaNe" output. I'd still like to know the details of comment #1

Comment 4 Lionel Elie Mamane 2012-09-05 13:59:15 UTC

(In reply to comment #1)

> e) that said, I can't reproduce the crash on master by setting LANG=fr_LU.UTF8,

> Are you using the default built-in icu or a system one ?

My understanding is: System ICU.
My config.log says:

configure:24098: checking which icu to use
configure:24101: result: external
(...)
configure:24136: checking for icu-config
configure:24154: found /usr/bin/icu-config
configure:24166: result: /usr/bin/icu-config
configure:24175: checking ICU version
configure:24183: result: OK, 4.4.1

and my config_host.mk has:
config_host.mk:export ICU_MAJOR=4
config_host.mk:export ICU_MICRO=1
config_host.mk:export ICU_MINOR=4
config_host.mk:export SYSTEM_ICU=YES


Package versions (Debian amd64):

libicu-dev                                           4.4.1-8
libicu44                                             4.4.1-8
libicu48                                             4.8.1.1-7
libicu4j-java                                        4.0.1.1-1

> And is locale is just set with LANG=fr_LU.UTF8 and letting LibO just take its defaults from that.

$ locale
LANG=fr_LU.UTF-8
LANGUAGE=
LC_CTYPE="fr_LU.UTF-8"
LC_NUMERIC="fr_LU.UTF-8"
LC_TIME="fr_LU.UTF-8"
LC_COLLATE="fr_LU.UTF-8"
LC_MONETARY="fr_LU.UTF-8"
LC_MESSAGES=en_GB.UTF-8
LC_PAPER="fr_LU.UTF-8"
LC_NAME="fr_LU.UTF-8"
LC_ADDRESS="fr_LU.UTF-8"
LC_TELEPHONE="fr_LU.UTF-8"
LC_MEASUREMENT="fr_LU.UTF-8"
LC_IDENTIFICATION="fr_LU.UTF-8"
LC_ALL=
$ set | egrep '(LANG|LC_)'
LANG=fr_LU.UTF-8
LC_MESSAGES=en_GB.UTF-8


In LibO, menu Tools / Options / Language Settings / Languages has:

Language of
  User interface:              English (USA)
  Locale setting:               Default - French (Luxembourg)
  Decimal separator key:  (checked) Same as locale setting ( . )
  Default currency:           Default - EUR

Default languages for documents
  Western:                       ABC_V Default - French (Luxembourg)
  Asian:            (greyed out)  Default - Chinese (simplified)
  CTL:              (greyed out)  Default - Hindi

Enhanced language support
 (unchecked) Enabled for Asian languages
 (unchecked) Enabled for complex text layout (CTL)

Note that the locale's decimal separator key is comma, not dot. So the "same as locale setting" is weird, it says it is a dot. If I uncheck that box, I don't see a way to choose between dot or comma.

Comment 5 Lionel Elie Mamane 2012-09-05 14:23:49 UTC

The abort happens if I simply open a new writer document and type "47211 " (without the quotes: 47211 then a space).

Also happens with these numbers followed by space:
20000
12345
123456
99999999999999999
10000

Does not happen with these numbers followed by space:
1
11
211
7211
1234
9999

So it seems to be related to length of the number. If I select Tools / Language / For all text / English (USA), then:

no crash with 12345, but crash with 999999 and 123456. So it seems it crashes, too, but only on longer numbers.

OTOH, 1st is not autocorrected to 1\textsuperscript{st}, neither 2nd to 2\textsuperscript{nd}.


If I select Tools / Language / For all text / None (Do not check spelling), then no crash.

If I select Tools / Language / For all text / Welsh, then no crash, but this might be noise, since I don't always have the Welsh choice, but if I just repeatedly "open the menu and then click in the document to dismiss the menu", eventually I get a Welsh choice. Also, with welsh language, it accepts this text as valid (no spelling error:

  dfklsdfklmsdfklm lk sdfklsdfkl sdflksdfkl sdfkl sdfklsdfklsdfklsdfkl sdlkfsdkl

I don't speak Welsh, but the probability I randomly hit a string of consonants that are valid Welsh words is quite low :)

With French or English language, these words are underlined in red.


When doing a spell-check (not only in Welsh) my stdout/err fills with the following, not sure if it is related.


warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/ui/dialog/SwSpellDialogChildWindow.cxx:452: ApplyChangedSentence in initial call or after resume
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/ui/dialog/SwSpellDialogChildWindow.cxx:452: ApplyChangedSentence in initial call or after resume
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/core/edit/edlingu.cxx:1364: TODO: add ignore mark to text node
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/core/edit/edlingu.cxx:1364: TODO: add ignore mark to text node
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/core/edit/edlingu.cxx:1364: TODO: add ignore mark to text node
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/core/edit/edlingu.cxx:1364: TODO: add ignore mark to text node
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/core/edit/edlingu.cxx:1364: TODO: add ignore mark to text node
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/core/edit/edlingu.cxx:1364: TODO: add ignore mark to text node
warn:legacy.osl:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/sw/source/core/edit/edlingu.cxx:1364: TODO: add ignore mark to text node
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:719: !! Grammarchecker failed to provide end of sentence !!
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:719: !! Grammarchecker failed to provide end of sentence !!
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:719: !! Grammarchecker failed to provide end of sentence !!
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:704: nSuggestedEndOfSentencePos calculation failed?
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:719: !! Grammarchecker failed to provide end of sentence !!
warn:legacy.tools:9750:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/linguistic/source/gciterator.cxx:740: end-of-sentence detection failed?

Comment 6 Caolán McNamara 2012-09-06 14:01:19 UTC

a) "Note that the locale's decimal separator key is comma, not dot. So the "same as locale setting" is weird, it says it is a dot. If I uncheck that box, I don't
see a way to choose between dot or comma."

That affects input only IIRC. Basically I think the history is that in Spain (or someplace or other) the numeric keypad on their keyboard has a . on it but everyone wants it to produce a , and not a . so it should only affect input using the numeric keypad, toggles between what's written on the keyboard vs what locale data says is that locales separator.

b) "with welsh language, it accepts this text as valid (no spelling error:"

You presumably don't have a welsh spellchecking dictionary installed. Format->character->welsh will have a abc+blue check if it has welsh spelling support.

c) icu 4.4.1

I reckon that's the root of the problem, there's apparently a bug in 4.4.1 where the icu number formatter generates not-a-number text when formatting various large numbers (or what it thinks are large numbers) and our original code didn't handle any unexpected events along those lines.
 defe079d455ccc958fd0128e8a8cf0e4aeb5cd9c + a05357ab69712bec53c2d8d17efbbf25907ff9b8

v. likely fixes this. defe079d455ccc958fd0128e8a8cf0e4aeb5cd9c is already in 3-6 now I believe. if you could test if a05357ab69712bec53c2d8d17efbbf25907ff9b8 fixes this for you with icu 4.4.1 and cherry-pick it if does that'd be good.

Comment 7 Lionel Elie Mamane 2012-09-10 12:14:37 UTC

(In reply to comment #6)

> b) "with welsh language, it accepts this text as valid (no spelling error:"

> You presumably don't have a welsh spellchecking dictionary installed.
> Format->character->welsh will have a abc+blue check if it has welsh spelling
> support.

Can't find any language list in Format / character.

In "Tools / Language / for all text", none of the listed languages has an "abc+blue check". Given the very limited list, I assumed it listed only the languages for which I have an appropriate spellchecking dictionary. I assume you meant Tools / Options / Language settings / Default language for documents / Western? I have the mark for fr_BE, fr_CA, fr_LU, fr_FR, fr_monaco, fr_CH and 16 different en_* entries. But my "Tools / Language / for all text" has, at startup, only en_US and fr_LU?

Today, I started LibO, I had only fr_LU and en_US. After looking in that menu about 5 times, "Irish" (and not anymore "Welsh") appeared. I notice that if I take "more" in "Tools / Language / for all text", Irish is the default for new documents, but "for current document" is checked. Not sure I understand what's happening here.

That might be worth its own bug report.

Comment 8 Lionel Elie Mamane 2012-09-10 12:24:27 UTC

(In reply to comment #6)

> c) icu 4.4.1

> I reckon that's the root of the problem, there's apparently a bug in 4.4.1 (...)

> defe079d455ccc958fd0128e8a8cf0e4aeb5cd9c +
> a05357ab69712bec53c2d8d17efbbf25907ff9b8
> 
> v. likely fixes this. defe079d455ccc958fd0128e8a8cf0e4aeb5cd9c is already in
> 3-6 now I believe. if you could test if
> a05357ab69712bec53c2d8d17efbbf25907ff9b8 fixes this for you with icu 4.4.1 and
> cherry-pick it if does that'd be good.

Found a05357ab69712bec53c2d8d17efbbf25907ff9b8 already pushed to libreoffice-3-6; pulled and rebuilt, verified as fixed. I get on stderr:

warn:i18npool:21859:1:/home/master/src/libreoffice/workdirs/libreoffice-3.6/i18npool/source/ordinalsuffix/ordinalsuffix.cxx:132: ordinal NaNe didn't start with expected 263.283.153 prefix

but my understanding is that this is "normal".

Comment 9 Caolán McNamara 2012-09-11 15:50:38 UTC

we're straying off topic :-)

a) "Can't find any language list in Format / character".
In writer, in the format->character dialog, under the font tab, you must have a "Language" list. But it's the same list as Tools / Options / Language settings / Default language for documents, so looking there will suffice. The dictionary extensions that are built are a subset based off the --with-lang option IIRC, though Linux builds may default to also looking at whatever hunspell dictionaries you happen to have installed in /usr/share/hunspell to use in addition to the dictionary extensions.

b) "Tools / Language / for all text", this is for changing the language that the text claims to be written in. Its trick is that it lists the language the text is set to, the default document language *and* the language that libexttextcat guesses the language might really be. So if you, in a French speaking locale write some German text it should (if the text is long enough for it to guess) list German as an option to change the language of that selection to. So that feature doesn't care about what spellchecking dictionaries are installed.

c) the toggle defaults to not changing the default language for all new documents. UI is sort of poor there alright.