Bug 136615 - Re-consider date/time parts calculation for functions and formatting
Summary: Re-consider date/time parts calculation for functions and formatting
Status: NEW
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Calc (show other bugs)
(earliest affected)
Inherited From OOo
Hardware: All All
: medium normal
Assignee: Not Assigned
URL: https://issues.oasis-open.org/browse/...
: 98443 (view as bug list)
Depends on:
Blocks: ODF-spec
  Show dependency treegraph
Reported: 2020-09-09 17:06 UTC by Mike Kaganski
Modified: 2021-12-15 23:39 UTC (History)
7 users (show)

See Also:
Crash report or crash signature:
Regression By:

A sample of wrong round-tripping of date/time value (10.02 KB, application/vnd.oasis.opendocument.spreadsheet)
2020-09-09 17:06 UTC, Mike Kaganski

Note You need to log in before you can comment on or make changes to this bug.
Description Mike Kaganski 2020-09-09 17:06:08 UTC
Created attachment 165328 [details]
A sample of wrong round-tripping of date/time value

As discussed in Bug 127334 comment 8, bug 127170 (attachment 165304 [details]; bug 127170 comment 26) etc., there are some serious problems around date and time parts as calculated using spreadsheet functions YEAR/MONTH/DAY/HOUR/MINUTE/SECOND, and displayed using date/time format strings like "YYYY-MM-DD HH:MM:SS" (and any variations, including decimal parts of second); both in old versions of Calc, and in current versions, that had numerous tweaks related to how dates/times and durations are displayed.

Current implementation of YEAR/MONTH/DAY/HOUR/MINUTE/SECOND spreadsheet functions is consistent with ODF [1], but the standard itself is obviously wrong in its wording regarding these functions. E.g., MINUTE is defined there to be calculated discarding minute fraction from (hour fraction multiplied by 60), while SECOND is defined as rounding (minute fraction multiplied by 60) to nearest whole number, but at the same time, being in [0..59]. This necessarily produces wrong set of YEAR/MONTH/DAY/HOUR/MINUTE/SECOND values for any datetime with seconds between 59.5 and 60.0: result of MINUTE would be as if the next minute is not yet started, while result of SECOND would be as if it has already started. See attachment for an example of the problem.

At the same time, there is much confusion and disagreement around how date/times are displayed, where users don't realize that underlying data format for datetime is double, or where users disagree to account for the tiny floating-point calculation errors. Given the current distinction between display of points in time vs duration formats, the inherent imprecision of floating point operations may result in showing e.g. 9 minutes where users expect 10 (just because the closest double happened to be slightly less than exact value). It also looks natural to expect that a date string (using a date/time format string) represents the same date parts as returned from corresponding spreadsheet functions on the same value.

To create a consistent and robust set of rules for calculation of date/time parts from a serial date (or a duration), the proposal is the following:

1. In OpenFormula specification, change definition of SECOND to mention explicitly that the result is taken modulo 60, like this: Second=MOD(ROUND(T*86400);60).

2. In OpenFormula specification, change definition of YEAR/MONTH/DAY/HOUR/MINUTE (and others that might be affected) to account for the seconds rounding. Example for MINUTE: Minute=(MOD(ROUND(T*86400);3600)-SECOND(T))/60

3. Make date/time format strings follow the same rules, *unless* the format string includes decimal seconds part.

4. In case a date/time format string includes decimal seconds part, it should follow similar rules, with the exception that rounding should be performed not to seconds, but to 1/1000th of a second. This should give enough precision to have robustness against displaying rounding errors after tens/hundreds of arithmetical operations on times/durations in the representable date/time range. For the excess decimal second digits, zeroes must be output. The idea is that normally rounding to a second is enough for widest range of cases; only when explicitly requested, the sub-second precision should be used, still with precautions to provide robust results, not easily affected by inherent imprecision of floating-point operations and representation.

5. Duration formats should follow same or similar set of rules; whether they should round least significant displayed part other than second (as opposed to truncation) is not covered by the proposal, only that the values should be calculated based on original double converted to whole numbers of seconds or milliseconds (using rounding), as required by the format string and rules above.

This proposal is aimed to be compatible with rules used for the same operations in MS Excel, be predictable and sane, address most expectations, and allow creating clear documentation about implementation.

[1] https://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#SECOND
Comment 1 Albrecht Müller 2020-09-09 19:42:43 UTC
I think it is pretty clear that there is a need to correct the specification and that the defects in the specification block quite a few bugs.

Just for information: After pressing Shift+Control+F9 the round-tripping example shows correct values with LibreOffice Version Did not try with a current version yet.
Comment 2 Regina Henschel 2020-09-14 19:33:19 UTC
@Mike: Please write a proposal to the ODF TC:
Comment 3 Mike Kaganski 2020-09-16 12:10:23 UTC
(In reply to Regina Henschel from comment #2)

Comment 4 Regina Henschel 2021-12-09 17:10:16 UTC
The ODF TC has improved the specification for ODF 1.4, see OFFICE-4094.
Comment 5 Regina Henschel 2021-12-09 17:11:50 UTC
*** Bug 98443 has been marked as a duplicate of this bug. ***
Comment 6 Albrecht Müller 2021-12-10 21:46:39 UTC
Some remarks to OFFICE-4094

a) Backwards compatibility. OFFICE-4094 introduces substantial changes to a couple of date and time functions. The previous specification has been around for about 10 years. According to the German Wikipedia LibreOffice has about 200 million users. Therefore I think there are many spreadsheet documents that were built according to the previous specification. They may deliver quite different results if the changes in the specification will be implemented. Therefore it should be specified how these functions should work on legacy spreadsheets.

b) Consistency of various LibreOffice date and time systems. Did someone check if the specifications for date and time calculations used in LibreOffice are consistent? I think the problem addressed in OFFICE-4094 is a special case of a more general problem: LibreOffice uses different representations of date and time. Some of them are able to represent common values exactly, such as 1/24 of a day corresponding to 1 hour. A date and time value formatted as a string is able to represent such a value exactly. On the other hand, the floating point number representation of date and time cannot represent 1/24 exactly and therefore has to use a value that best approximates 1/24. Problems arise if it is necessary to convert between different representations. This may affect:

Formatting of date and time values. I think the current version of LibreOffice uses different rounding strategies for e.g. [HH]:MM:SS and HH:MM:SS which may or may not correspond to the rounding strategy used for SECOND, MINUTE etc. functions. This is due to a distinction made between durations and wall clock time.


StarBasic-Functions related to date and time: Format, CDate, DateSerial, DateValue, Day, Month, WeekDay, Year, Hour, Minute, Second, TimeSerial, TimeValue, CDateToIso, CDateFromIso, CDateToUnoDate, CDateFromUnoDate, CDateFromUnoTime, CDateToUnoTime, CDateToUnoDateTime, CDateFromUnoDateTime, DateAdd, DateDiff, DatePart, Date, Now, Time, Timer

UNO-Interface: Interface com.sun.star.i18n.XCalendar (especially the functions setDateTime, getDateTime, setValue, getValue ) and functions that deal with struct com.sun.star.util.DateTime, struct com.sun.star.util.Date, struct com.sun.star.util.Time, struct com.sun.star.util.DateTimeRange , struct com.sun.star.util.DateTimeWithTimezone , struct com.sun.star.util.TimeWithTimezone and struct com.sun.star.util.Duration

Maybe there are other dependencies which I did not notice yet.

c) Compliance with other standards and interoperability:
The Standard ISO 8601 (see https://en.wikipedia.org/wiki/ISO_8601) allows fractional parts on any lowest order time element, not just on seconds. This standard also has a duration format that you cannot represent by floating point numbers alone, e.g. a duration of one month. Where does LibreOffice follow or deviate from the standard? Does the specification of LibreOffice date and time arithmetic make sure that they are compatible with other Programs, e.g. Excel? How should LibreOffice's date and time system interact with date and time representations used in other programming languages such as Python, Java and others?
Comment 7 Albrecht Müller 2021-12-12 22:31:30 UTC
In short: The proposed changes to spreadsheet functions handle the "YEAR, MONTH, DAY, HOUR, MINUTE, SECOND" but not the "and other related" part.
Comment 8 Albrecht Müller 2021-12-15 13:17:39 UTC
I think the problem is definitely not fixed yet as - without stating this explicitly - the specification now uses two different conventions that contradict each other.

There is a truncating convention where the transition from one day to the next occurs exactly at midnight. This convention is used for example in 4.11.7 Basis (https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part4-formula/OpenDocument-v1.3-os-part4-formula.html#__RefHeading__1017922_715980110)

The other convention is the rounding convention where this transition occurs half a second earlier. The proposed definitions of the YEAR, MONTH, DAY, HOUR, MINUTE and SECOND functions use that convention.

So there is half a second each day where the conventions disagree to which day this time belongs. The bad thing is that the serial numbers that specify dates represent points in time that are exactly at boundaries of these disagreement regions.

There may be use cases where the serial numbers are the results of some calculations that include round off errors, e.g. because they use time values that don't have exact representations. Due to these round off errors up to 50% of the results may fall into the region where the conventions disagree.

Users may not be aware that they mix functions that use different conventions. Therefore they may see results that seem to be clearly wrong at random in up to 50% of the cases. Actually they observe a behaviour that follows from the specification. There is nothing an implementer of the functions can do about. The user has to use clumsy workarounds to get the expected results.

I tried to find out how OOXML (https://www.ecma-international.org/publications-and-standards/standards/ecma-376/) handles this rounding problem. As I was not not able to find relevant rules I think the OOXML specification left this point unspecified. This way it does not prescribe a behaviour that users may consider buggy.
Comment 9 Mike Kaganski 2021-12-15 13:35:19 UTC
(In reply to Albrecht Müller from comment #8)
> There is a truncating convention where the transition from one day to the
> next occurs exactly at midnight. This convention is used for example in
> 4.11.7 Basis
> (https://docs.oasis-open.org/office/OpenDocument/v1.3/os/part4-formula/
> OpenDocument-v1.3-os-part4-formula.html#__RefHeading__1017922_715980110)
> The other convention is the rounding convention where this transition occurs
> half a second earlier. The proposed definitions of the YEAR, MONTH, DAY,
> HOUR, MINUTE and SECOND functions use that convention.

You put it not as the standard puts it. Specifically, there is a possible ambiguity, but not a clear conflict.

Namely, the "truncate" at 4.11.7 is defined as:

> truncate(date) truncates any fractional (hours, minutes, seconds) of a date
> value and returns the whole date portion.

It is not defined in terms of floating-point number, but in terms of fractional part of a date, so the question remains, how the fractions of a date are calculated here, to be discarded.

Anyway, everything in comment 8 is unrelated to a specific problem that this issue is dedicated: inconsistency in the *related* set of functions. The interrelations with other functions, if documented, may be arbitrarily complex and still be acceptable. The change in standard is the step toward consistency in one specific part, and improvements must be done step by step. Trying to create noise in each and every issue which mentions "Calc" and "time", just because a fix to that issue does not make the world the ideal place, is not constructive.

But yes, this problem is definitely not fixed yet - as the change was only at standard level, not in LibreOffice code.
Comment 10 Albrecht Müller 2021-12-15 23:02:32 UTC
(In reply to Mike Kaganski from comment #9)
> … so the question remains, how the fractions of a
> date are calculated here, to be discarded.

You just discovered another bug in the specification: The sentence you refer to should have a clear meaning. By the way, take it together with 

> 4.3.3 Date
> Date is a subtype of Number.
> Date is represented by an integer value.

This means a date value by definition has no hours, minutes or seconds to discard. So what is the intended meaning of "truncate(date)" anyway?

What are *related* functions?

Assume T = 1999-12-31 23:59:59,600. LibreOffice Version returns 1999 for YEAR(T) and 0 (i.e. no) for ISLEAPYEAR(T). What will happen after you implemented the proposed changes? YEAR(T) = 2000 and ISLEAPYEAR(T) = no?

Or Assume T1 = 2021-12-15 23:59:59,600 and T2 = 2021-12-16 00:00:00,000. 
LibreOffice delivers the following results:

DAY(T1) = 15, DAY(T2) = 16, NETWORKDAYS(T1;T2) = 2 and NETWORKDAYS(T2;T2) = 1.

After the changes I would expect DAY(T1) = DAY(T2) = 16, but what NETWORKDAYS should return? The same results as before despite the fact that - according to the YEAR, MONTH, DAY … functions  - the arguments appear to be identical?

The headline of OFFICE-4094 contains the clause "… and other related" but I did not see a list that names these functions. That's why I looked for things that I think are related to the intended changes to get some starting point. Unfortunately the resulting list became pretty long.

I also thought that having names for different semantics of date and time values makes it easier to talk about the differences and their consequences, e.g. when you have to trade consistency in one part for inconsistencies somewhere else. One example is interoperability: I tried NETWORKDAYS on Exel97 and got the impression, that NETWORKDAYS uses truncating semantics, while the YEAR, MONTH … functions use rounding semantics. As I don't have a current version available I cannot check if Microsoft changed this behaviour later. If they did not, you have to choose if you want to have internal consistency or compatibility with Excel.

> Trying to create noise … is not constructive.
OK, with these remarks I conclude the noise.
Comment 11 Regina Henschel 2021-12-15 23:39:54 UTC
The argument for functions DATE, MONTH and YEAR (and some others) has type 'DateParam', see line 'Syntax:'. The type 'DateParam' is specified in section 4.11.3 in part 4. The type 'DateParam' is different from type 'Date'.