Bug 126311 - SPELL: af_ZA.aff quietly broken
Summary: SPELL: af_ZA.aff quietly broken
Status: RESOLVED INSUFFICIENTDATA
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Linguistic (show other bugs)
Version:
(earliest affected)
6.3.0.1 rc
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: Dictionaries
  Show dependency treegraph
 
Reported: 2019-07-09 15:04 UTC by elmar.braun
Modified: 2021-02-14 04:06 UTC (History)
4 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description elmar.braun 2019-07-09 15:04:09 UTC
Description:
The file af_ZA.aff from the Afrikaans dictionary contains lines such as:

SFX J   0  etjie   ^.{1,3}[aeiouyëê]ng

https://cgit.freedesktop.org/libreoffice/dictionaries/tree/af_ZA/af_ZA.aff#n144

As far as I can gather from hunspell's documentation, that last string permits a regex-*like* format, but not a full regex. Specifically the "^" anchor appears to be unsupported, and the "^" character only recognized for negated classes such as "[^abc]".

I've tried loading that dictionary with hunspell 1.7.0, compiled with MSVC 2015.3, with STL iterator debugging enabled. The iterator debugging asserts on line 4360 of hunspell's affixmgr.cxx while processing the above SFX statement.

https://github.com/hunspell/hunspell/blob/v1.7.0/src/hunspell/affixmgr.cxx#L4360

Hunspell here uses a reverse_iterator to iterate over an already reversed copy of the string "^.{1,3}[aeiouyëê]ng", and attempts to inspect the character preceding the "^", which would dereference the invalid iterator string.rbegin()-1.

Of course a release build would quietly do the out-of-bounds access. I wasn't able to force any misbehavior in 6.3.0.1 (which, unlike 6.2.5, contains the broken dictionary). But I don't speak Afrikaans, so I can't ascertain to what degree the dictionary is actually doing what it's supposed to do.

Steps to Reproduce:
1. build hunspell with iterator debugging
2. load af_ZA dictionary

Actual Results:
iterator debugging reports out-of-bounds access

Expected Results:
loading dictionary succeeds


Reproducible: Always


User Profile Reset: No



Additional Info:
Comment 1 Julien Nabet 2019-07-21 08:17:47 UTC
Elmar: taking a look to https://www.systutorials.com/docs/linux/man/4-hunspell/, I don't see anything which would prevent "^" to be used in a regexp.
Since I only build on Linux at home, I can't reproduce this since _ITERATOR_DEBUG_LEVEL doesn't seem to exist for Linux tools.

Andras: thought you might be interested in this one.
Comment 2 elmar.braun 2019-07-22 13:57:06 UTC
The relevant section of the documentation reads: "Condition is a simplified, regular expression-like pattern, which must be met before the affix can be applied. (Dot signs an arbitrary character. Characters in braces sign an arbitrary character from the character subset. Dash hasn't got special meaning, but circumflex (^) next the first brace sets the complementer character set.)"

So, no mention of a leading circumflex in an expressly "simplified" pattern. Nor is there mention of the following regex-like patterns that are currently used in SFX statements in af_ZA.aff: ".{1,3}"; ".+"; "(d|t)"

Furthermore I've grepped my way through all .aff files currently in the repository, and found no other that uses any of these for SFX/PFX conditions.

Not sure if this helps, or catches this instance, but for libstdc++ there appears to be a define "_GLIBCXX_DEBUG", which enables some kind of iterator debugging. Likewise for libc++: http://releases.llvm.org/8.0.0/projects/libcxx/docs/DesignDocs/DebugMode.html#iterator-debugging-checks
Comment 3 Julien Nabet 2019-07-26 18:43:04 UTC
Sorry, I built hunspell with:
./configure CXXFLAGS='-g -O0 -Wall -Wextra -D_GLIBCXX_DEBUG' --with-warnings --with-ui --with-readline

then I went in:
<local LO root>/dictionaries/af_ZA
launched:
<local hunspell root>/./src/tools/hunspell -d af_ZA
I just see:
Hunspell 1.7.0

So don't know how to reproduce this.
Comment 4 Xisco Faulí 2020-07-17 10:58:33 UTC
A new major release of LibreOffice is available since this bug was reported.
Could you please try to reproduce it with the latest version of LibreOffice
from https://www.libreoffice.org/download/libreoffice-fresh/ ?
I have set the bug's status to 'NEEDINFO'. Please change it back to
'UNCONFIRMED' if the bug is still present in the latest version.
Comment 5 QA Administrators 2021-01-14 04:07:32 UTC Comment hidden (obsolete)
Comment 6 QA Administrators 2021-02-14 04:05:56 UTC
Dear elmar.braun,

Please read this message in its entirety before proceeding.

Your bug report is being closed as INSUFFICIENTDATA due to inactivity and
a lack of information which is needed in order to accurately
reproduce and confirm the problem. We encourage you to retest
your bug against the latest release. If the issue is still
present in the latest stable release, we need the following
information (please ignore any that you've already provided):

a) Provide details of your system including your operating
   system and the latest version of LibreOffice that you have
   confirmed the bug to be present

b) Provide easy to reproduce steps – the simpler the better

c) Provide any test case(s) which will help us confirm the problem

d) Provide screenshots of the problem if you think it might help

e) Read all comments and provide any requested information

Once all of this is done, please set the bug back to UNCONFIRMED
and we will attempt to reproduce the issue. Please do not:

a) respond via email 

b) update the version field in the bug or any of the other details
   on the top section of our bug tracker

Warm Regards,
QA Team

MassPing-NeedInfo-FollowUp