Description: When parsing date-like strings, Calc exhibits inconsistent behavior for large year values due to an integer overflow issue. Specifically, year recognition alternates between valid (Date) and invalid (Text) periodically as the input year number increases. This occurs because of how large year numbers are handled during conversion from sal_Int32 to sal_uInt16 and the subsequent interpretation of that sal_uInt16 as a sal_Int16. Enter the following strings into Calc cells: "32767-1-1" -> Correctly recognized as Date(32767, 1, 1) "32768-1-1" -> Incorrectly recognized as Text("32768-1-1") "80000-1-1" -> Incorrectly recognized as Date(14464, 1, 1) (due to 80000 % 65536 = 14464) "100000-1-1" -> Incorrectly recognized as Text("100000-1-1") (due to 100000 % 65536 = 34464, which is > 32767) The issue stems from the ImpSvNumberInputScan::ImplGetYear function in svl/source/numbers/zforfind.cxx. 1. The input year (as digits) is parsed into a sal_Int32. 2. This sal_Int32 is then cast to a sal_uInt16 (nYear). When the input year value exceeds 65535, this conversion results in uint16 overflow (modulo 65536 arithmetic). This explains the observed periodicity of 65536. 3. Downstream, this sal_uInt16 nYear value is effectively treated as a sal_Int16 when determining validity or setting the cell value (e.g., via Calc.setValue or similar mechanisms). 4. When the sal_uInt16 value falls within the range 32768 to 65535, its interpretation as a sal_Int16 results in a negative value. 5. Calc apparently treats these negative years as invalid in this context, causing the entire input string to be fallback-recognized as Text. Therefore: - Years resulting in an effective sal_uInt16 value between 0 and 32767 (inclusive) are considered valid Date years. - Years resulting in an effective sal_uInt16 value between 32768 and 65535 (inclusive) are considered invalid, and the input becomes Text. While this is unlikely to affect users entering typical, smaller year values, it can lead to unexpected data interpretation if large numbers are entered, potentially due to typos. This inconsistent behavior could compromise data integrity in specific scenarios. A straightforward mitigation is to enforce a maximum valid year limit, effectively treating any year input that would result in a value > 32767 as invalid (Text). This can be achieved with a small modification to the sal_Int32 to sal_uInt16 conversion: The following patch implements the proposed solution. It has been tested locally and resolves the described inconsistent behavior. diff --git a/svl/source/numbers/zforfind.cxx b/svl/source/numbers/zforfind.cxx index 2b91fa662..648c1007f 100644 --- a/svl/source/numbers/zforfind.cxx +++ b/svl/source/numbers/zforfind.cxx @@ -1111,7 +1111,11 @@ sal_uInt16 ImpSvNumberInputScan::ImplGetYear( sal_uInt16 nIndex ) // leading zero as convention. if (nLen <= 6) { - nYear = static_cast<sal_uInt16>(sStrArray[nNums[nIndex]].toInt32()); + sal_Int32 temp = sStrArray[nNums[nIndex]].toInt32(); + // safely convert int32 to uint16 by capping values at 65535 + // downstream code will cast uint16 to int16(where values >32767 become negative and are considerd invalid for year) + nYear = temp < 65536 ? temp : 65535; + // A year in another, not Gregorian CE era is never expanded. // A year < 100 entered with at least 3 digits with leading 0 is taken // as is without expansion. I have implemented and tested this fix. However, I am currently unfamiliar with the process for contributing code changes to the Calc/LibreOffice project. Guidance on submitting this patch would be appreciated. Steps to Reproduce: Enter the following strings into Calc cells: "32767-1-1" -> Correctly recognized as Date(32767, 1, 1) "32768-1-1" -> Incorrectly recognized as Text("32768-1-1") "80000-1-1" -> Incorrectly recognized as Date(14464, 1, 1) (due to 80000 % 65536 = 14464) "100000-1-1" -> Incorrectly recognized as Text("100000-1-1") (due to 100000 % 65536 = 34464, which is > 32767) Actual Results: "32767-1-1" -> Date(32767, 1, 1) "32768-1-1" -> Text("32768-1-1") "80000-1-1" -> Date(14464, 1, 1) "100000-1-1" -> Text("100000-1-1") Expected Results: "32767-1-1" -> Date(32767, 1, 1) "32768-1-1" -> Text("32768-1-1") "80000-1-1" -> Text("80000-1-1") "100000-1-1" -> Text("100000-1-1") Reproducible: Always User Profile Reset: No Additional Info: Version: 24.8.3.2 (AARCH64) / LibreOffice Community Build ID: 48a6bac9e7e268aeb4c3483fcf825c94556d9f92 CPU threads: 8; OS: macOS 15.0; UI render: Skia/Metal; VCL: osx Locale: en-US (en_CN.UTF-8); UI: en-US Calc: threaded
(In reply to gplhust955 from comment #0) > ... > "32768-1-1" -> Incorrectly recognized as Text("32768-1-1") > ... > "100000-1-1" -> Incorrectly recognized as Text("100000-1-1") ... Please clarify, why do you claim *these* handlings incorrect?
Thanks for catching that. I shouldn't have used "correctly" or "incorrectly". Let me restate the current behavior: "32767-1-1" recognized as Date(32767, 1, 1) "32768-1-1" recognized as Text("32768-1-1") "80000-1-1" recognized as Date(14464, 1, 1) (due to 80000 % 65536 = 14464) "100000-1-1" recognized as Text("100000-1-1") (due to 100000 % 65536 = 34464, which is > 32767) To achieve simpler and safer behavior, my suggestion is to implement a clear threshold: only values up to and including 32767 should be interpreted as valid years for date parsing.
Definitely. This, in fact, the expected behavior: the fundamental limitations are e.g. documented in include/tools/date.hxx.
(In reply to Mike Kaganski from comment #3) Sigh, I was unclear. I meant, that your proposal, what you suggest, is the expected behavior; and so the issue is set to NEW.
(In reply to gplhust955 from comment #0) > The following patch implements the proposed solution. It has been tested > locally and resolves the described inconsistent behavior. > > diff --git a/svl/source/numbers/zforfind.cxx > b/svl/source/numbers/zforfind.cxx > index 2b91fa662..648c1007f 100644 > --- a/svl/source/numbers/zforfind.cxx > +++ b/svl/source/numbers/zforfind.cxx > @@ -1111,7 +1111,11 @@ sal_uInt16 ImpSvNumberInputScan::ImplGetYear( > sal_uInt16 nIndex ) > // leading zero as convention. > if (nLen <= 6) > { > - nYear = static_cast<sal_uInt16>(sStrArray[nNums[nIndex]].toInt32()); > + sal_Int32 temp = sStrArray[nNums[nIndex]].toInt32(); > + // safely convert int32 to uint16 by capping values at 65535 > + // downstream code will cast uint16 to int16(where values >32767 > become negative and are considerd invalid for year) > + nYear = temp < 65536 ? temp : 65535; > + > // A year in another, not Gregorian CE era is never expanded. > // A year < 100 entered with at least 3 digits with leading 0 is > taken > // as is without expansion. > > > I have implemented and tested this fix. However, I am currently unfamiliar > with the process for contributing code changes to the Calc/LibreOffice > project. Guidance on submitting this patch would be appreciated. Create a Gerrit account and set up the SSH connection: https://wiki.documentfoundation.org/Development/gerrit Submit the patch: https://wiki.documentfoundation.org/Development/gerrit/SubmitPatch Send a license statement to the mailing list: https://wiki.documentfoundation.org/Development/GetInvolved#License_statement
Fwiw, the "80000-1-1" recognized as Date(14464, 1, 1) (or any ((value > 32767) % 65536) <= 32767 for that matter) of course _is_ a bug. Instead of nYear = temp < 65536 ? temp : 65535; though I'd rather check for the actual range -32768 <= temp <= 32767 and not rely on an implicit undocumented behaviour of values being casted.
That’s a better solution. I noticed that a value of 0 indicates the invalidation of nYear. So, we can set a value for nYear only when temp is less than 32,768 (since n <= 6 ensures that temp will always be positive). Does that sound good?
No, nLen<=6 does not ensure that temp will be positive, the number string could be "-99999". But actually it's more complicated, the -32768 <= year <= 32767 constraint is for the underlying proleptic Gregorian calendar, but the ImpSvNumberInputScan::ImplGetYear() function is used with all possible calendars. A, for example, Buddhist or Islamic or Jewish calendar would have a 5000 or 1600 or 3000 something years offset. I'm not sure how to handle this, or if the check should rather be applied only to a Gregorian calendar, which currently seems best to me.
(In reply to Eike Rathke from comment #8) The result of the function (strangely enough, a sal_uInt16) is always passed to CalendarWrapper::setValue - and then to i18n::XCalendar::setValue, taking sal_Int16. It doesn't seem to make sense to allow out-of-signed-16-bits values returned from ImpSvNumberInputScan::ImplGetYear - or maybe I misunderstood? Thanks!
(In reply to Mike Kaganski from comment #9) I’m not familiar with specific calendar types, but as I understand it, XCalendar is an abstraction interface implemented by concrete calendar systems. ImplGetYear appears to be a generic method designed for all calendar types, meaning it should only handle general functionality. During gdb debugging, when I input a string like "-8000-1-1", the "-" is treated as a sign, making nLen equal to 4, and it doesn’t count as part of nYear. In this case, temp will always remain positive. However, I’m unsure if this behavior applies to all cases. If all cases behave similarly, there may be no need to change the method’s signature—hopefully, the original developer accounted for this when choosing u16. To address potential issues, the best we can do is prevent u16 overflow by capping nYear (e.g., setting it to 0 in cases of invalid positive overflow).
(In reply to Mike Kaganski from comment #9) > (In reply to Eike Rathke from comment #8) > The result of the function (strangely enough, a sal_uInt16) A second change maybe could address that, it's just confusing. > is always passed > to CalendarWrapper::setValue - and then to i18n::XCalendar::setValue, taking > sal_Int16. It doesn't seem to make sense to allow out-of-signed-16-bits > values returned from ImpSvNumberInputScan::ImplGetYear - or maybe I > misunderstood? Thanks! Ah darn, you are right, I didn't take i18n::XCalendar::setValue taking only sal_Int16 anyway (not sal_Int32) into account.