Bug 127334 - Regression: Incompatible changes in date/time arithmetic introduced between Version: 6.0.4.2 (x64) and version 6.2.6.2 (ubuntu)
Summary: Regression: Incompatible changes in date/time arithmetic introduced between V...
Status: UNCONFIRMED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
Version:
(earliest affected)
6.2.6.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-09-04 13:13 UTC by Albrecht Müller
Modified: 2020-11-18 22:05 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:


Attachments
Different versions of Calc show different results of a few trivial date/time calculations (20.19 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-09-04 13:13 UTC, Albrecht Müller
Details
Test file with added floating-point calculations to demonstrate why this is the case. (16.94 KB, application/vnd.oasis.opendocument.spreadsheet)
2019-09-04 13:54 UTC, Eike Rathke
Details
<spam> (deleted)
2020-06-02 17:53 UTC, THE_N4R4NT
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Albrecht Müller 2019-09-04 13:13:34 UTC
Created attachment 153864 [details]
Different versions of Calc show different results of a few trivial date/time calculations

Description:
To my knowledge no agreed upon specification exists which describes how LibreOffice shall handle time and date calculations (see bug 127170). Unexpected behaviour seem to get fixed on an "ad hoc" basis, see bug 125099 which should be "RESOLVED FIXED" from version 6.2.5 on. However, I observed this problem with version 6.2.6.2. 

The attached example file reflects a situation where you have some kind of log which contains date and time as separate entities. Time resolution is one minute. In order to calculate durations you have to combine (= add) these date and time values. The file contains 20 entries whose times differ by one minute giving 19 time spans of one minute. These durations are formatted as time values. An additional column uses the minutes function to calculate the minutes count compares these values with the expected value which is always one. A cell (D25) labeled "Errors:" sums up the differences between expected an calculated values. This value should be zero.

How to reproduce:
Open attachment with different versions of Calc and observe the error count.

Actual results:
I observed that version 6.0.4.2 (x64) of Calc showed an error count of zero while version 6.2.6.2 (ubuntu) gave 10.

Expected results:
Simple date/time calculations should show the same and correct results no matter which version is used to open the spreadsheet file.
Comment 1 Eike Rathke 2019-09-04 13:54:07 UTC
Created attachment 153865 [details]
Test file with added floating-point calculations to demonstrate why this is the case.

The MINUTE() function was changed to return the integer wall clock minute not rounded, as specified by ODF OpenFormula https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#MINUTE

Applying MINUTE() to a calculated duration is problematic if rounded values are expected. For this rather use ROUND(duration*24*60). In the attached document I added column E with the underlying duration floating-point duration values, column F with calculated minutes, and column G with the MINUTE() value calculated from the duration, which illustrates the unexpected but correct results. Column H uses ROUND() for the here expected calculation.
Comment 2 Albrecht Müller 2019-09-04 18:57:20 UTC
The help information that explains the minute function (https://help.libreoffice.org/6.3/en-US/text/scalc/01/func_minute.html?&DbPAR=CALC&System=WIN ) simply states  "MINUTE Calculates the minute for an internal time value. The minute is returned as a number between 0 and 59." and specifies "=MINUTE(8.999) returns 58" and "=MINUTE(8.9999) returns 59". Especially it does not refer to the document https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#MINUTE that provides a some more information about the minute function. It does not even mention things like wall clock time or durations. This gives you a lot of freedom to declare some behaviour as correct and you are free to change the behaviour from version to version. Therefore closing this report as "NOTABUG" is completly ok.

On the other hand: Doing so has some pretty nasty consequences from a users point of view, and that's why I filed bug 127170.

First: I did this kind of trivial time calculations with different spreadsheet programs (including LibreOffice Calc up to version 6.0.4.2) and they all were able to come up with correct minute values. As long as this worked as expected I did not care how this was achieved. I assume that they internally rounded float values to some internal time resolution before they interpreted them as time and/or date. If this resolution is choosen such that it is some orders of magnitude above the floating point round-off errors these errors will affect the result in extreme cases only. The programming language Java does date/time calculations based on integral numbers of milliseconds to avoid round-off problems altogether. Closing this bug as "NOTABUG" tells me that LibreOffices quality standards allow time calculation algorithms to deliver zero or one minute with equal probability when subtracting e.g. two minutes from three minutes.

Second: This problem affects all functions of this kind. A round-off error in the sub-milliseconds range may cause the year function to return the wrong year. The key problem is that usually integral numbers of time units (years, month, days, hours, minutes and seconds) are considered and therefore the calculations could be exact. However, LibreOffice uses a time representation that cannot guarantee an exact representation of the time units hours, minutes and seconds.

Third: I cannot trust the results of LibreOffices date/time calculations any more. Any legacy spreadsheets containing this kind of simple time calculations will deliver wrong results if opened with recent versions of Calc. What makes the matters worse is that the user documentation does not specify how Calc is intended to handle the problems rooted in its time representation and obviously this intention changes from time to time. Therefore  I cannot know how to use the date/time calculation functions in a way that future version of Calc will return the same results even if I assume that these functions are implemented correctly.
Comment 3 Eike Rathke 2019-09-05 10:19:33 UTC
(In reply to Albrecht Müller from comment #2)
> Closing this
> bug as "NOTABUG" tells me that LibreOffices quality standards allow time
> calculation algorithms to deliver zero or one minute with equal probability
> when subtracting e.g. two minutes from three minutes.

You're mixing different things. Using a date+time floating point representation where time is fractions of days with 0.5 == 12h is a de-facto standard in the spreadsheet world implemented by all major spreadsheet applications. This is how date+time input is stored as cell content and in fact you can format *any* floating point value with a date+time display format and get *some* result. Repeated addition and/or subtraction of floating point values like in the sample document *will* lead to accumulated round-off errors. You'll get similar results for column F of the sample document if you do the same calculations in other spreadsheet applications.

However, here the difference is in the MINUTE() function, which is a wall clock time function, i.e. returns values 0..59 and does not round anymore, same as a clock does not display 12:35 if time is 12:34:56

Whether before and after calculations with values formatted as times the underlying floating-point value could be rounded to, for example, milliseconds is an implementation detail that maybe could be treated differently, but again may also yield unexpected results if they then are rounded and should not.


> Second: This problem affects all functions of this kind. A round-off error
> in the sub-milliseconds range may cause the year function to return the
> wrong year. The key problem is that usually integral numbers of time units
> (years, month, days, hours, minutes and seconds) are considered and
> therefore the calculations could be exact. However, LibreOffice uses a time
> representation that cannot guarantee an exact representation of the time
> units hours, minutes and seconds.

In an ideal world we could use nanoseconds as date+time representation and have a distinct datetime type that's only converted when needed, however, that's not how spreadsheet applications work due to their legacy of historically grown data types and procedures.

> Third: I cannot trust the results of LibreOffices date/time calculations any
> more. Any legacy spreadsheets containing this kind of simple time
> calculations will deliver wrong results if opened with recent versions of
> Calc.

What did it help that in older versions a time value of 23:59:59.99999 was displayed as 24:00:00.00 and the MINUTE() function returned 0 and if it was displayed as date+time it was displayed as the next day?
Comment 4 Albrecht Müller 2019-09-05 12:31:18 UTC
(In reply to Eike Rathke from comment #3)
> 
> Whether before and after calculations with values formatted as times the
> underlying floating-point value could be rounded to, for example,
> milliseconds is an implementation detail that maybe could be treated
> differently, but again may also yield unexpected results if they then are
> rounded and should not.
> 
I did this kind of calculations for years and Calc version 6.2.6.2 was the first spreadsheet application where I noticed this problem. I always got exact results despite the fact that I used floating point calculations that are inherently inexact. There is an obvious way how this may work: The spreadsheet applications limit the time resolution to some fixed value. Just for reference I tried a twenty year old MS-Excel spreadsheet and discovered that this application limits formatting of time values to a millisecond precision. If the accumulated round-off errors of the floating point calculations remain below half of the maximal resolution then you can identify the exact result. Based on this exact result you can derive the exact values for the other units (seconds, minutes etc.) too. There is no need to distinguish between concepts such as duration or wall clock time. This distinction may be useful for formatting purposes, however. If such a distinction is made this needs to be documented in the help function.

The maximal resolution is _not_ an implementation detail but an essential property of the time calculation mechanism. I think it is necessary to choose this value carefully with respect to the underlying floating point arithmetic. A coarse time resolution is robust against round-off errors. If you choose a more fine grained resolution then the results will be more susceptible to round-off errors. I think that the behaviour of Version 6.2.6.2 is roughly equivalent to pushing this resolution to the precision of the underlying floating point arithmetic. As a consequence the calculations return values that are - from a users perspective - random values that are correct in about 50% of the cases and one off in the other 50%.

> 
> What did it help that in older versions a time value of 23:59:59.99999 was
> displayed as 24:00:00.00 and the MINUTE() function returned 0 and if it was
> displayed as date+time it was displayed as the next day?

This was never a problem as I always got correct and consistent results even when converting between different spreadsheet applications. I assume that these applications round a value corresponding to 23:59:59.99999 to the nearest legal value (probably exactly 1 in this case) before they do any date/time related calculations.
Comment 5 Albrecht Müller 2019-09-06 16:03:00 UTC
Eike Rathke confirmed the behaviour that I consider a bug. We differ in the assessment of the consequences, however. That's why I reopened the Bug.

(In the history I saw that at 2019-09-04 18:57:20 UTC I had changed the resolution from NOTABUG to FIXED - I was not aware that I did this and this change was not intended)

After consulting the "Bug Priority Triage Flowchart Suggestion(s)" flowchart (https://wiki.documentfoundation.org/images/0/06/Prioritizing_Bugs_Flowchart.jpg) 
I think its priority should be set to "Critical" for the following reasons:

- Loss of data:  Opening existing spreadsheets with the new version of Calc will not reproduce the previous results (which were correct) any more. So these results are lost.

- Broken core functions: I think doing correct calculations is a core function of a spreadsheet. There seems to be a de facto standard how spreadsheet programs deal with date/time calculations. This standard seems to work quite well in practice so usually you don't have to worry about its limitations. Previous versions of Calc adhered to this standard. This has been changed such that you now get results that are wrong in about 50% of the cases. (Think of a calculator that gives 0 or 1 with equal probability if you subtract two numbers that differ by one).

It seems to be intended behaviour so I assume that it will appear always and on clean installs. It affects all people that do floating point calculations on time values and try to extract date/time-specific information like year, month, day, hour, minute or second. Trivial calculations such as adding a date and a time value and subtracting this from another combination of date and time are sufficient to demonstrate the problem.
Comment 6 Albrecht Müller 2019-09-17 18:39:12 UTC
(In reply to Eike Rathke from comment #1)

> The MINUTE() function was changed to return the integer wall clock minute
> not rounded, as specified by ODF OpenFormula
> https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#MINUTE

Some further analysis made mit think that this specification is really broken. I doubt that the creators of this spec were aware of the consequences.

Some observations:
- https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#SECOND This specification states "This function presumes that leap seconds never exist." I did not find a reference where this is explicitly stated but I think that the specification also ignores time zones and daylight saving time. The combined effects of this omissions is that time and date calculations may be wrong by more than 25 hours by design. Given those limitations I think it makes sense that the de-facto standard you mentioned does not consider a time resolution better than one second. Thus this specification of the SECOND function specifies that the result is rounded to the nearest second.

- https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#DateTime Here DateTime is defined: "DateTime is a subtype of Number. It is a Date plus Time." So obviously adding date and time values is an intended and common use. The problem here is that adding and subtracting date and time values may lead to round-off errors. For example: On my machine the formula

= ((("2019-09-17" + "0:0:1") - "2019-09-17") - "0:0:1")/"0:0:1"

has the result 2,33761966278142E-07. Combining a date and a time into a DateTime value and separating the components again leads to a loss of significance which is in the order of magnitude of a fraction of a microsecond. Given the general limitations of the date and time arithmetic this error should not matter. 

-   https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#MINUTE The specification of the MINUTE function does not require any rounding to the nearest second. So consider some result that should be exactly the start of some day but which is a little bit off. So the time fraction may be something like "23:59:59.999" or "00:00:00,001". The seconds part will round to zero in both cases (actually in the first case it should round to 60 seconds but I assume that the modulo is taken. The specification is contradictory is this point: ROUND(59.999) gives 60 but the function is supposed to return values between 0 and 59 ). Rounding down the minutes part gives 59 or 0. The combined result corresponds to "...:59:00" or "...:00:00" respectively. The little error of two microseconds is amplified to an error of one minute (or 59 minutes depending on the point of view). This problem is not restricted to the MINUTES functions. See bug 127476 for an example where this effect may give you the wrong millennium.

A key problem is that time and date values that use integral values for year, month, day, hour, minute and seconds are located exactly at the boundaries of  time intervals. Tiny round-off errors cause the values to be located on either side of the boundary. The specified rounding method amplifies this effect so the users see a kind of random results. I cannot believe that this is the behaviour the authors of the specification had in mind.

I think there is a way to avoid the pseudo random behaviour and to get consistent results: All functions that calculate properties of date and time values such as YEAR, MONTH, DAY, HOUR, MINUTE or SECOND (there may be a couple of other such functions) should operate on a value that is rounded to the nearest second. For background information see  https://bugs.documentfoundation.org/show_bug.cgi?id=127477#c1 .
Comment 7 Xisco Faulí 2019-09-26 09:48:13 UTC
You can't confirm your own bugs. Moving it back to UNCONFIRMED until someone
else confirms it.
Comment 8 Mike Kaganski 2020-01-07 22:08:59 UTC
I suppose that some kind of rounding is required to accompany the change of MINUTE etc. to wall clock, and changes like in bug 125099. Since those changes are targeted at correctness, but operate on inherently imprecise doubles, the rounding would define the limit of the precision.

Excel, while also operating on doubles, and naturally accumulating errors at operations on times, hides that by using rounding. Interestingly, it uses two different rounding modes: one for cell formatting with fractions of second, and another for HOUR/MINUTE/SECOND and cell formatting without fractions of second. Rounding happens in both cases, but in first case it's to 1/1000 of a second, and in second case (pun not intended) to 1 second.

Given that in our current range of datetimes, the arithmetic precision of + and - is inside 1/100000 s, whole-seconds precision gives large margin - it would need at least tens of thousands +- operations for the error to surface. For the millisecond precision, the margin is also significant.

Possibly implementing the two different modes trying to mimic Excel is not necessary, but limiting ourselves to ms seems correct, or else seeming correctness of wall clock hits with issues like this one.
Comment 9 THE_N4R4NT 2020-06-02 17:53:02 UTC Comment hidden (spam)
Comment 10 Buovjaga 2020-06-02 18:19:25 UTC Comment hidden (obsolete)