41166 – EDITING [IT]: Italian Date containing martedì not parsed as Date

Bug 41166 - EDITING [IT]: Italian Date containing martedì not parsed as Date

Summary: EDITING [IT]: Italian Date containing martedì not parsed as Date

Status:	VERIFIED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	framework (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium normal
Assignee:	Eike Rathke (retired, only occasionally showing up)

URL:
Whiteboard:	BSA target:4.3.0 target:4.2.5
Keywords:

Depends on:
Blocks:

Reported:	2011-09-24 04:12 UTC by Vittorio Mandelli
Modified:	2014-05-19 14:38 UTC (History)
CC List:	4 users (show)

See Also:
Crash report or crash signature:

Attachments
create in Base with HSQL (but the problem persist also with MySQL, MariaDB ecc. (3.54 KB, application/vnd.oasis.opendocument.database) 2012-02-20 12:52 UTC, Vittorio Mandelli	Details
calc document to reproduce (8.88 KB, application/vnd.oasis.opendocument.spreadsheet) 2012-02-21 10:04 UTC, Lionel Elie Mamane	Details
in-progress patch (8.12 KB, text/plain) 2012-02-21 10:08 UTC, Lionel Elie Mamane	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Vittorio Mandelli 2011-09-24 04:12:53 UTC

Problem description: ITALIAN LANGUAGE ONLY. Use complete date format (id. martedì 27 settembre. The date is not accepted and the default date is used.

Steps to reproduce:
1. ....use complete date format (martedì 27 settembre 2011)
2. ....select and copy the complete date
3. ....paste the selected date to another date field

Current behavior: The last date field doesn't accept the selection and turn to the default date (sabato 30 dicembre 1899).

Expected behavior: the pasted date field is the same of the former (martedì 27 settembre 2011). Everything works ok if you try with the other days of the week (lunedì, mercoledì, giovedì venerdì sabato, domenica). Everything works ok if write the date (id. martedì 27 settembre 2011) directly in the field typing using the keyboard.

Comment 1 Jean-Baptiste Faure 2012-01-22 07:17:00 UTC

Please could you attach a test file which the problem?

Thank you for your help. JBF

Comment 2 Rainer Bielefeld Retired 2012-02-20 10:37:33 UTC

NOT reproducible with "LibreOffice 3.5.0 German UI/ Italian Locale [Build-ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735] on German WIN7 Home Premium (64bit). Copy - paste of date as reported works fine in my sample document with Italian as default language.

Indeed, a sample document will be required

Comment 3 Vittorio Mandelli 2012-02-20 12:52:04 UTC

Created attachment 57364 [details]
create in Base with HSQL (but the problem persist also with MySQL, MariaDB ecc.

please, write "martedì 21 febbraio 2012" or another date with "martedì" (italian for Tuesday) in date field 1 and copy it in the date field 2: the output will be the default date "sabato 30 dicembre 1899".

Comment 4 Rainer Bielefeld Retired 2012-02-20 22:36:48 UTC

[Reproducible] with reporters sample and "LibreOffice 3.5.0 German UI/Locale [Build-ID: 7e68ba2-a744ebf-1f241b7-c506db1-7d53735] on German WIN7 Home Premium (64bit)

With my own sample I see the Same problelm. Steps to reproduce:
1. open newsamplebase.odb
2. select date field "domenica 01 gennaio 2012" with <control+c>
3. with <tab> go to target field and paste with  <control+c>
   As expected due to column formatting you will see "01/01/12"
4 redo for following rows
  all works fine, only date containing "martedì" will not be recognized and 
  pasted as "30/12/99"

Old problem, also with LibO 3.3.0 Portable and OOo 3.1.1, seems inherited from OOo

No idea what I did yesterday where everything seemed to work fine

@Vittorio Mandelli:
What's your OS?

@Lionel:
Please feel free to reassign (or reset Assignee to default) if it’s not your area or if provided information is not sufficient. Please set Status to ASSIGNED if you accept this Bug.

Comment 5 Lionel Elie Mamane 2012-02-21 10:03:09 UTC

As explained in more detail below, my head is exploding looking at this code, and not my area, so I'm dumping that bug on Eike. Eike, feel free to reassign to someone else in turn.

This bug is not specific to Base. It can be reproduced in Calc in the attached document:
- type (or paste) "lunedì 11 aprile 2011" at position B2 or B3: it is properly formatted as a Date, and column C shows the numer of days between that date and today.
- type (or paste) "martedì 12 aprile 2011" at position B2 or B3; stays as string, not as date, and column C gives error

Don't copy the whole cell from column A to column B, as this will copy the format with it. Either type it manually or copy-paste it e.g. from here (your browser or mail client).

Analysis: the root of the problem is that "martedì" starts with "mar" which is the abbreviation for "marzo", and the Date parsing code thus tries to match "tedì 27 settembre 2011" as a date-without-month, which fails because "tedì" is not a Date component.

Now, a very dirty and quick fix would be to switch the priority of "recognise month" and "recognise day of week" in function ImpSvNumberInputScan::ScanStartString in svl/source/numbers/zforfind.cxx. It might create the symmetric problem in another language, though (a month is not recognised as such because a prefix of it is a valid day-of-week).

I tried preparing a more proper patch, which would be to change GetMonth() (and most of the other callers of StringContains()) to match only a whole word, not a prefix. This is made more complicated by languages that don't separate words by non-letter characters (such as space or punctuation). I abandoned the idea of "quickly" doing that when I realised that our internal APIs don't allow to check for status of a single character in isolation: we have to stick it in a String! See http://api.libreoffice.org/common/ref/com/sun/star/i18n/XCharacterClassification.html

That's inconvenient with the current implementation of StringContains through StringPtrContains. However, StringPtrContains has only one direct caller, namely skipthousands. If this one (and thus all its callers) can be converted to using String-and-offset instead of pointer-to-buffer, things become sane again. Or maybe really we should have http://api.libreoffice.org/common/ref/com/sun/star/i18n/XCharacterClassification2.html
to fix the "flaw" in XCharacterClassification of insisting on a string ;-)

As a consequence of this messy situation, this goes above the effort / decision power I want to exert outside of my area.

Another "proper" (but maybe less so in some ways, and more so in other ways) fix would be to make that parser backtracking instead of eager. That is, if an option fails, it still tries to go along the others. Also a lot of work :)

Comment 6 Lionel Elie Mamane 2012-02-21 10:04:30 UTC

Created attachment 57408 [details]
calc document to reproduce

Comment 7 Lionel Elie Mamane 2012-02-21 10:08:06 UTC

Created attachment 57410 [details]
in-progress patch

Comment 8 Lionel Elie Mamane 2012-02-21 23:20:43 UTC

(In reply to comment #5)

> Another "proper" fix would be to make that parser backtracking instead of
> eager. That is, if an option fails, it still tries to go along the others.
> Also a lot of work :)

Actually, not as much as I thought yesterday, by far. I started dreaming that if only we were using a higher level programming language, we could just use a dynamic (at-runtime) parser generator for this parsing job, and then we just "plug in" the internationalised strings into the grammar definition, generate it, and use that parser.

Some googling later, not only do dynamic parser generators exist for C++, but Boost (which we already use/bundle) has one, namely Boost::Spirit. From a cursory glance, it looks quite decent.

So the idea of replacing that *whole* parsing code by a Boost::Spirit grammar is growing on me. Comments on that?

Comment 9 Eike Rathke (retired, only occasionally showing up) 2012-02-22 06:54:28 UTC

(In reply to comment #8)
> So the idea of replacing that *whole* parsing code by a Boost::Spirit grammar
> is growing on me. Comments on that?

Yeah, don't ;-)
Honestly, there are too many side conditions to consider, and with a Spirit parser we'd also loose the coupling with i18n I guess.

Btw, your addition of isAlpha( sal_Unicode c ) et al to CharClass in in-progress.patch wouldn't work as the underlying CharacterClassification doesn't have a corresponding method on purpose, it only works on OUString and position. Actually using sal_Unicode for characters limits them to UCS2 and does not take UTF-16 e.g. surrogates into account, which the i18n methods handle using OUString::iterateCodePoints()

Current code (not only in the number formatter) doesn't account for full UTF-16 in many places where it should iterate over code points instead of sal_Unicode, but that's a different story..

Comment 10 Lionel Elie Mamane 2012-02-22 09:19:03 UTC

(In reply to comment #9)
> (In reply to comment #8)

>> So the idea of replacing that *whole* parsing code by a Boost::Spirit grammar
>> is growing on me. Comments on that?

> Yeah, don't ;-)
> Honestly, there are too many side conditions to consider, and with a Spirit
> parser we'd also loose the coupling with i18n I guess.

I understand "coupling with i18n" as "using the isAlphaNumeric, etc from pFormatter->GetCharClass()", right?

Well, today I've been toying with doing just that coupling, actually. I haven't *quite* got it working, but maybe with some upstream help?

What kind of "side conditions" do you have in mind?

> Btw, your addition of isAlpha( sal_Unicode c ) et al to CharClass in
> in-progress.patch wouldn't work as the underlying CharacterClassification
> doesn't have a corresponding method on purpose, it only works on OUString and
> position. Actually using sal_Unicode for characters limits them to UCS2 and
> does not take UTF-16 e.g. surrogates into account, which the i18n methods
> handle using OUString::iterateCodePoints()

Ah yes, our punishment for not using 32 bit characters (UCS-4). I see.

Comment 11 Marina Latini (SUSE) 2014-05-15 08:57:13 UTC

Confirmed on LibreOffice downloaded from http://dev-builds.libreoffice.org/pre-releases/deb/x86_64/

LibreOffice version: 4.3.0.0.alpha1
Build ID: 46cfcd5a05aa1d13fecd73f5a25b64b8d8dd6781
OS: Ubuntu 13.04 x86_64

Steps to reproduce:
1) Start LibreOffice Calc
2) Open the attached file https://bugs.freedesktop.org/attachment.cgi?id=57408 "calc document to reproduce"
3) enter "lunedì 9 maggio 2014" at position B2: properly formatted as a Date, and column C shows the numer of days between that date and today.
4) enter "martedì 10 maggio 2014" at position B3: shown as String, not formatted as Date. Column C gives error.

Comment 12 Commit Notification 2014-05-17 01:13:01 UTC

Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=8e71f81f47c20320c4de7a7aadb7d524f0d8ea76

resolved fdo#41166 match month and day name word instead of substring



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.

Comment 13 Commit Notification 2014-05-17 02:11:50 UTC

Eike Rathke committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=3885b5d4b00ebb31adabc36c507abd642c03d0d4

resolved fdo#41166 match month and day name word instead of substring



The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.

Comment 14 Eike Rathke (retired, only occasionally showing up) 2014-05-17 02:22:53 UTC

Pending review for 4-2 at https://gerrit.libreoffice.org/9387

Comment 15 Commit Notification 2014-05-18 16:53:01 UTC

Eike Rathke committed a patch related to this issue.
It has been pushed to "libreoffice-4-2":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=41b9ab0375f455f504c66b4870e817aba0f990d2&h=libreoffice-4-2

resolved fdo#41166 match month and day name word instead of substring


It will be available in LibreOffice 4.2.5.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds
Affected users are encouraged to test the fix and report feedback.

Comment 16 Marina Latini (SUSE) 2014-05-19 14:38:48 UTC

Verified as fixed for LibreOffice 

Version: 4.3.0.0.alpha1+
Build ID: b5f45a51638901ea679bf238ab460283e41bc622
TinderBox: Linux-rpm_deb-x86_64@46-TDF, Branch:master, Time: 2014-05-19_06:16:16

Best,
Marina