166195 – Integer Overflow in Year Recognition Leads to Periodic Date/Text Interpretation

Bug 166195 - Integer Overflow in Year Recognition Leads to Periodic Date/Text Interpretation

Summary: Integer Overflow in Year Recognition Leads to Periodic Date/Text Interpretation

Status:	NEW

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	unspecified
Hardware:	All All

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2025-04-15 12:21 UTC by gplhust955
Modified:	2025-04-17 13:44 UTC (History)
CC List:	2 users (show)

See Also:
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description gplhust955 2025-04-15 12:21:04 UTC

Description:
When parsing date-like strings, Calc exhibits inconsistent behavior for large year values due to an integer overflow issue. Specifically, year recognition alternates between valid (Date) and invalid (Text) periodically as the input year number increases. This occurs because of how large year numbers are handled during conversion from sal_Int32 to sal_uInt16 and the subsequent interpretation of that sal_uInt16 as a sal_Int16.


Enter the following strings into Calc cells:
"32767-1-1" -> Correctly recognized as Date(32767, 1, 1)
"32768-1-1" -> Incorrectly recognized as Text("32768-1-1")
"80000-1-1" -> Incorrectly recognized as Date(14464, 1, 1) (due to 80000 % 65536 = 14464)
"100000-1-1" -> Incorrectly recognized as Text("100000-1-1") (due to 100000 % 65536 = 34464, which is > 32767)

The issue stems from the ImpSvNumberInputScan::ImplGetYear function in svl/source/numbers/zforfind.cxx.

1. The input year (as digits) is parsed into a sal_Int32.
2. This sal_Int32 is then cast to a sal_uInt16 (nYear). When the input year value exceeds 65535, this conversion results in uint16 overflow (modulo 65536 arithmetic). This explains the observed periodicity of 65536.
3. Downstream, this sal_uInt16 nYear value is effectively treated as a sal_Int16 when determining validity or setting the cell value (e.g., via Calc.setValue or similar mechanisms).
4. When the sal_uInt16 value falls within the range 32768 to 65535, its interpretation as a sal_Int16 results in a negative value.
5. Calc apparently treats these negative years as invalid in this context, causing the entire input string to be fallback-recognized as Text.

Therefore:
- Years resulting in an effective sal_uInt16 value between 0 and 32767 (inclusive) are considered valid Date years.
- Years resulting in an effective sal_uInt16 value between 32768 and 65535 (inclusive) are considered invalid, and the input becomes Text.

While this is unlikely to affect users entering typical, smaller year values, it can lead to unexpected data interpretation if large numbers are entered, potentially due to typos. This inconsistent behavior could compromise data integrity in specific scenarios. 

A straightforward mitigation is to enforce a maximum valid year limit, effectively treating any year input that would result in a value > 32767 as invalid (Text). This can be achieved with a small modification to the sal_Int32 to sal_uInt16 conversion:

The following patch implements the proposed solution. It has been tested locally and resolves the described inconsistent behavior.

diff --git a/svl/source/numbers/zforfind.cxx b/svl/source/numbers/zforfind.cxx
index 2b91fa662..648c1007f 100644
--- a/svl/source/numbers/zforfind.cxx
+++ b/svl/source/numbers/zforfind.cxx
@@ -1111,7 +1111,11 @@ sal_uInt16 ImpSvNumberInputScan::ImplGetYear( sal_uInt16 nIndex )
     // leading zero as convention.
     if (nLen <= 6)
     {
-        nYear = static_cast<sal_uInt16>(sStrArray[nNums[nIndex]].toInt32());
+        sal_Int32 temp = sStrArray[nNums[nIndex]].toInt32();
+        // safely convert int32 to uint16 by capping values at 65535
+        // downstream code will cast uint16 to int16(where values >32767 become negative and are considerd invalid for year)
+        nYear = temp < 65536 ? temp : 65535;
+
         // A year in another, not Gregorian CE era is never expanded.
         // A year < 100 entered with at least 3 digits with leading 0 is taken
         // as is without expansion.


I have implemented and tested this fix. However, I am currently unfamiliar with the process for contributing code changes to the Calc/LibreOffice project. Guidance on submitting this patch would be appreciated.

Steps to Reproduce:
Enter the following strings into Calc cells:
"32767-1-1" -> Correctly recognized as Date(32767, 1, 1)
"32768-1-1" -> Incorrectly recognized as Text("32768-1-1")
"80000-1-1" -> Incorrectly recognized as Date(14464, 1, 1) (due to 80000 % 65536 = 14464)
"100000-1-1" -> Incorrectly recognized as Text("100000-1-1") (due to 100000 % 65536 = 34464, which is > 32767)

Actual Results:
"32767-1-1" -> Date(32767, 1, 1)
"32768-1-1" -> Text("32768-1-1")
"80000-1-1" -> Date(14464, 1, 1)
"100000-1-1" -> Text("100000-1-1")

Expected Results:
"32767-1-1" -> Date(32767, 1, 1)
"32768-1-1" -> Text("32768-1-1")
"80000-1-1" -> Text("80000-1-1")
"100000-1-1" -> Text("100000-1-1")


Reproducible: Always


User Profile Reset: No

Additional Info:
Version: 24.8.3.2 (AARCH64) / LibreOffice Community
Build ID: 48a6bac9e7e268aeb4c3483fcf825c94556d9f92
CPU threads: 8; OS: macOS 15.0; UI render: Skia/Metal; VCL: osx
Locale: en-US (en_CN.UTF-8); UI: en-US
Calc: threaded

Comment 1 Mike Kaganski 2025-04-15 12:42:36 UTC

(In reply to gplhust955 from comment #0)
> ...
> "32768-1-1" -> Incorrectly recognized as Text("32768-1-1")
> ...
> "100000-1-1" -> Incorrectly recognized as Text("100000-1-1") ...

Please clarify, why do you claim *these* handlings incorrect?

Comment 2 gplhust955 2025-04-15 12:57:02 UTC

Thanks for catching that. I shouldn't have used "correctly" or "incorrectly".

Let me restate the current behavior: 
"32767-1-1"  recognized as Date(32767, 1, 1)
"32768-1-1"  recognized as Text("32768-1-1")
"80000-1-1"  recognized as Date(14464, 1, 1) (due to 80000 % 65536 = 14464)
"100000-1-1" recognized as Text("100000-1-1") (due to 100000 % 65536 = 34464, which is > 32767)

To achieve simpler and safer behavior, my suggestion is to implement a clear threshold: only values up to and including 32767 should be interpreted as valid years for date parsing.

Comment 3 Mike Kaganski 2025-04-15 13:00:51 UTC

Definitely. This, in fact, the expected behavior: the fundamental limitations are e.g. documented in include/tools/date.hxx.

Comment 4 Mike Kaganski 2025-04-15 13:01:49 UTC

(In reply to Mike Kaganski from comment #3)

Sigh, I was unclear. I meant, that your proposal, what you suggest, is the expected behavior; and so the issue is set to NEW.

Comment 5 Buovjaga 2025-04-15 15:15:31 UTC

(In reply to gplhust955 from comment #0)
> The following patch implements the proposed solution. It has been tested
> locally and resolves the described inconsistent behavior.
> 
> diff --git a/svl/source/numbers/zforfind.cxx
> b/svl/source/numbers/zforfind.cxx
> index 2b91fa662..648c1007f 100644
> --- a/svl/source/numbers/zforfind.cxx
> +++ b/svl/source/numbers/zforfind.cxx
> @@ -1111,7 +1111,11 @@ sal_uInt16 ImpSvNumberInputScan::ImplGetYear(
> sal_uInt16 nIndex )
>      // leading zero as convention.
>      if (nLen <= 6)
>      {
> -        nYear = static_cast<sal_uInt16>(sStrArray[nNums[nIndex]].toInt32());
> +        sal_Int32 temp = sStrArray[nNums[nIndex]].toInt32();
> +        // safely convert int32 to uint16 by capping values at 65535
> +        // downstream code will cast uint16 to int16(where values >32767
> become negative and are considerd invalid for year)
> +        nYear = temp < 65536 ? temp : 65535;
> +
>          // A year in another, not Gregorian CE era is never expanded.
>          // A year < 100 entered with at least 3 digits with leading 0 is
> taken
>          // as is without expansion.
> 
> 
> I have implemented and tested this fix. However, I am currently unfamiliar
> with the process for contributing code changes to the Calc/LibreOffice
> project. Guidance on submitting this patch would be appreciated.

Create a Gerrit account and set up the SSH connection: https://wiki.documentfoundation.org/Development/gerrit

Submit the patch: https://wiki.documentfoundation.org/Development/gerrit/SubmitPatch

Send a license statement to the mailing list: https://wiki.documentfoundation.org/Development/GetInvolved#License_statement

Comment 6 Eike Rathke 2025-04-15 16:59:01 UTC

Fwiw, the "80000-1-1"  recognized as Date(14464, 1, 1) (or any ((value > 32767) % 65536) <= 32767 for that matter) of course _is_ a bug.

Instead of
nYear = temp < 65536 ? temp : 65535;
though I'd rather check for the actual range -32768 <= temp <= 32767 and not rely on an implicit undocumented behaviour of values being casted.

Comment 7 gplhust955 2025-04-16 03:45:12 UTC

That’s a better solution. I noticed that a value of 0 indicates the invalidation of nYear. So, we can set a value for nYear only when temp is less than 32,768 (since n <= 6 ensures that temp will always be positive). Does that sound good?

Comment 8 Eike Rathke 2025-04-16 11:15:17 UTC

No, nLen<=6 does not ensure that temp will be positive, the number string could be "-99999". But actually it's more complicated, the -32768 <= year <= 32767 constraint is for the underlying proleptic Gregorian calendar, but the ImpSvNumberInputScan::ImplGetYear() function is used with all possible calendars. A, for example, Buddhist or Islamic or Jewish calendar would have a 5000 or 1600 or 3000 something years offset. I'm not sure how to handle this, or if the check should rather be applied only to a Gregorian calendar, which currently seems best to me.

Comment 9 Mike Kaganski 2025-04-16 11:55:03 UTC

(In reply to Eike Rathke from comment #8)

The result of the function (strangely enough, a sal_uInt16) is always passed to CalendarWrapper::setValue - and then to i18n::XCalendar::setValue, taking sal_Int16. It doesn't seem to make sense to allow out-of-signed-16-bits values returned from ImpSvNumberInputScan::ImplGetYear - or maybe I misunderstood? Thanks!

Comment 10 gplhust955 2025-04-16 13:26:05 UTC

(In reply to Mike Kaganski from comment #9)

I’m not familiar with specific calendar types, but as I understand it, XCalendar is an abstraction interface implemented by concrete calendar systems.

ImplGetYear appears to be a generic method designed for all calendar types, meaning it should only handle general functionality. During gdb debugging, when I input a string like "-8000-1-1", the "-" is treated as a sign, making nLen equal to 4, and it doesn’t count as part of nYear. In this case, temp will always remain positive. However, I’m unsure if this behavior applies to all cases.

If all cases behave similarly, there may be no need to change the method’s signature—hopefully, the original developer accounted for this when choosing u16. To address potential issues, the best we can do is prevent u16 overflow by capping nYear (e.g., setting it to 0 in cases of invalid positive overflow).

Comment 11 Eike Rathke 2025-04-17 13:44:51 UTC

(In reply to Mike Kaganski from comment #9)
> (In reply to Eike Rathke from comment #8)
> The result of the function (strangely enough, a sal_uInt16)
A second change maybe could address that, it's just confusing.

> is always passed
> to CalendarWrapper::setValue - and then to i18n::XCalendar::setValue, taking
> sal_Int16. It doesn't seem to make sense to allow out-of-signed-16-bits
> values returned from ImpSvNumberInputScan::ImplGetYear - or maybe I
> misunderstood? Thanks!
Ah darn, you are right, I didn't take i18n::XCalendar::setValue taking only sal_Int16 anyway (not sal_Int32) into account.