I am affected by bug 125099 when I used Libre Office Calc version 6.2.5 on Ubuntu. According to comment #12 this problem should be fixed already. Comment #8 in Bug 125099 by Eike Rathke states that the correct way to avoid the problem is to use a duration format instead of a wall clock time format. I was not aware that there is such kind of distinction as I did not find a corresponding hint in the German documentation of the time format. The spread sheets affected by this bug still display wrong numbers using Version 220.127.116.11 Ubuntu Build Id 1:6.2.6-0ubuntu0.19.04.1, they show correct numbers with Calc version 18.104.22.168 (x64) Build-ID: 9b0d9b32d5dcda91d2f1a96dc04c645c450872bf (Windows).
Steps to Reproduce:
See help information: https://help.libreoffice.org/6.3/de/text/shared/01/05020301.html?DbPAR=SHARED#hd_id3155870
The explanation states that the format “HH” stands for hours between 0 … 23, and “[HH]” allows hours greater than 23 hours. I did not notice any hint that mentions durations, wall clock time or implications on the rounding behaviour.
Expected result: The format description should mention that there are different rounding strategies: A duration of 0:59 should display as 1 hour if rounded to a precision of hours. Interpreted as wall clock time this should round to 0 hours. There should also a reference to a more thorough specification that describes how Calc handles the problem of having no exact representation of most time values which is inherent in its time representation.
The documentation should also contain a clear specification of the meaning of a format string which does not change between versions as changing this specification breaks existing ods-Files. I observed some quite strange features of the time formatting mechanism, e.g. the time 23:57:00 gets formatted to “1437” using the format string “[MM]” but to “Minutes:  Month: 12” ( Version 22.214.171.124 (x64) on Windows) or to “Minutes:  Month: 12” (Version 126.96.36.199 Ubuntu) by the format string “"Minutes: "[MM]" Month: "M”. What is the correct interpretation of the format string parts “[MM]” or “MM”? How does it depend on the context?
The lack of a clear specification causes serious problems when upgrading to a new version. An existing speadsheet may show month values where you used to see minutes. The result the you see in an old spreadsheet may be a complete day off despite the fact that the actual calculation differs by a tiny fraction of a millisecond only.
User Profile Reset: No
Created attachment 153670 [details]
The file contains a few examples of time calculation and formatting problems
The documentation problem affects also the time functions (see following links) as their documenation contains no hints at the pecularities of date and time calculations (e.g. rounding behaviour, handling of inexact representation of time values, handling of leap seconds https://en.wikipedia.org/wiki/Leap_second etc.) either.
The problem seems to be inherited from the English version:
Without a clear specification there is no "expected behaviour" of date/time calculations and results may vary considerably between different versions of Calc.
(In reply to Albrecht Müller from comment #0)
> I observed some quite strange
> features of the time formatting mechanism, e.g. the time 23:57:00 gets
> formatted to “1437” using the format string “[MM]” but to “Minutes: 
> Month: 12” ( Version 188.8.131.52 (x64) on Windows) or to “Minutes:  Month:
> 12” (Version 184.108.40.206 Ubuntu) by the format string “"Minutes: "[MM]" Month:
> "M”. What is the correct interpretation of the format string parts “[MM]” or
> “MM”? How does it depend on the context?
This is a bit nasty as using both the [MM] and the M codes in one format seems to trick out the type detection, i.e. after defining it the format at the end is classified as Date format (see category where it's then assigned to). Now datetime formats do not know anything about duration hence the first MM is taken as a code for minute of time and the second M as a code for month, where the wall clock minute value is 57. Note that also the brackets around [MM] are displayed literally because in a datetime format they don't have a special meaning. Then as a consequence when saving the document the format is saved as
<number:text>] Month: </number:text>
which when loaded again results in the format code string
"Minutes: ["MM"] Month: "M
but then both the MM and the M denote a month keyword because there isn't an unquoted [MM]. This appears to be a bug (because in the file format there's the number:minutes element) but I think it's not easily solvable due to the ambiguity of MM.
I doubt there's a difference between Linux and Windows, but you simply tried to load the saved document on the other platform, where it then behaves different as explained.
(In reply to Eike Rathke from comment #3)
> This is a bit nasty as using both the [MM] and the M codes in one format
> seems to trick out the type detection, i.e. after defining it the format at
I don't really care if this behaviour is a bug or not as this is a bug report against the documentation and not against a particular behaviour of Calc. The key problem here is that the help function does not specify how to deal with ambiguous format strings: Should Calc consider these strings as illegal and treat them as an error? Or should it use some default interpretation? How to deal with a situation where you want minutes values at position where Calc would interpret it as month value? What are the exact rules to resolve the ambiguities?
I stumbled over this problem when I found your comment https://bugs.documentfoundation.org/show_bug.cgi?id=125099#c8 and tried to find out the difference between "[HH]" and "HH" in the context of duration and wall clock time. The help information https://help.libreoffice.org/6.3/en-US/text/shared/01/05020301.html?DbPAR=SHARED#hd_id3155870 states that "HH" are between 00 and 23 while "[HH]" may deliver values above 23. It says nothing about durations or wall clock time, and nothing about possibly different rounding behaviour. I think this is a defect in the documentation - users cannot know the kind of difference you mention in your comment.
Date/time calculations are tricky. A special problem is that these calculations normally use integral quantities of time units and therefore exact calculations are possible. Calc represents date/time values as float numbers. One day corresponds to the value of one. Hours, minutes and seconds correspond to values less than one. In general, points in time that correspond to a combination of integral numbers of hours, minutes and seconds have no exact representation as floating point values. Therefore date/time calculations usually contain round-off errors. Nevertheless all spreadsheet programs I used so far - including LibreOffice up to version 220.127.116.11 - delivered exact values when I did some simple date/time calculations. There is a defect in the help function as it does not specify how Calc is expected to handle date/time calculations, i.e. the behaviour of date/time calculations is undefined. A nasty consequence of this fact is a recent change the date/time calculation algorithm. This algorithm used to deliver a difference of one minute if you subtracted two timestamps that were one minute apart. The new algorithm returns essentially a random value which is 0 or 1 minute with about 50% probability each (see bug 127334) which breaks all legacy spreadsheets that contain this kind of calculations. As the user documentation does not specify a correct behaviour the behaviour of the new algorithm can be classified as "NOTABUG".
*** Bug 132083 has been marked as a duplicate of this bug. ***
The discussion of bug 132083 ended declaring it as a duplicate of this one here and referring to the CLOCK and DURATION time formats. So I try to contribute to the clarification of this issue. Unfortunately I do not really understand what CLOCK and DURATION time formats means. So I first try to explain what I understood:
A CLOCK time format is just an alias for a format that uses rounding based on the well known "Round down" principle, while a DURATION time format is just another word for a rounding based on "Rounding to the nearest". So if you round 3 hours 45 minutes to an hours precision you will get 3 hours when using a CLOCK time format and 4 hours when using a DURATION time format. There is an easy way to get "Rounding to the nearest" when you have a "Round down" mechanism: Add one half of the unit you want to round to and round the result down. This gives rounding to the nearest. Example: 3 hours 45 minutes + 30 minutes are 4 hours 15 minutes. Round this down and you get 4 hours which is the same result I get when I use a DURATION time kind of rounding on the original value.
There seems to be a well established convention that the representation of numbers in decimal format uses the "rounding to the nearest" principle, while the presentation of time values is expected to use a "round down" mechanism. I assume the reasons for this difference are like this:
Decimal numbers are often used in a context where you expect some errors, be it measurement errors, round off errors or maybe other things. That’s why you need some rounding that is robust against these errors. "Rounding to the nearest" provides this feature: you may have a measured a value of 1.9 or 2.1. This does not matter, both are roughly two, and that’s also the result of the rounding.
This is different when it comes to time values. Here you usually have some kind of ticks of a clock and you count these ticks. So you deal with integral values and there is no room for imprecise math like round off errors. We may start counting seconds with zero, the 60th tick starts the first minute and resets the seconds to zero again. If you are interested in the minutes value only you may just discard the seconds. This results in a kind of "round down" behaviour.
Both rounding methods have points where the result of the rounding depends on very small changes in the value to round. Rounding to the nearest is sensible to changes if the value to round is near to the middle of two adjacent possible rounding results, e.g. 2.499999 gets rounded to 2 while 2.499999 + 0.000002 = 2.500001 rounds to 3. The corresponding points of the round down methods are the possible rounding results themselves: 0.999999 gets rounded to 0, while 0.999999 + 0.000002 = 1.000001 gets rounded to 1.
This property of the round down mechanism is what makes the wall clock time kind of rounding so problematic. If used together with LibreOffice’s inherently imprecise time representation and without some mechanism that mimics some kind of countable clock ticks (see Mike Kaganski’s proposal in Bug 127334 comment 8) you will always see a lot of results that are plain wrong despite the fact that the underlying values are pretty close to those that are exact and correct. Bug 127334, bug 131151 and bug 132083 demonstrate this.
Based on this understanding of the problem I think that it is necessary to decide about the way Mike Kaganski’s proposal should be used and to document the results of this decision in the help information. This requires clarifying details such as: Should LibreOffice use a single basic time unit or more than one like Excel, what basic unit to use (1 second, 1 millisecond, 1 microsecond, something else, e.g. a parameter of the spreadheet)? What are the tradeoffs of those different basic units? How to achieve compatibility/interoperability with older spreadsheets and with Excel? This would be easier if someone is able to find some specification that describes how this old de facto industry standard is supposed to work.
I don’t think that it is a good idea associating the [HH] and HH constructs with different formatting strategies:
1) The CLOCK and DURATION formatting methods have not been documented yet. So probably a lot of existing spreadsheets rely on the assumption that the same rounding is applied no matter if you use [HH] or HH. A change would break these spreadsheets.
2) As there is an easy way to get DURATION kind of rounding if you have CLOCK type rounding there is no need to implement a DURATION kind of rounding. Maybe a hint in the help information is enough.
3) The [HH] vs. HH construct is already used to control if the result should show the total hours or the remainder of a division by 24. I think there is no natural connection between this feature and the kind of rounding: All possible four combinations of these features make sense. You may want to have wall clock time values formatted using a "round to the nearest" format - semantics being something like "about 3 o’clock". On the other hand age is a duration but this is usually given in a "round down" way. Therefore I fear that bundling the two concepts together will produce a lot of bug reports as users do not expect this connection. So please keep separate things separate.
Albrecht Müller: de facto my reporting bug 132083 occured when I opened an old, worked properly (Apache) OpenOffice v4.01 spreadsheet in LibreOffice.
(In reply to Albrecht Müller from comment #6)
> 2) As there is an easy way to get DURATION kind of rounding if you have
> CLOCK type rounding there is no need to implement a DURATION kind of
> rounding. Maybe a hint in the help information is enough.
This is totally wrong. That "easy way" meant for *display purposes* breaks the actual data. Modifying the data just to make display correct is plain wrong. Just forget about this, and accept the correct difference in the definition of wall clock and time duration. They are inherently different.
The decision on minimal precision is useful though IMO.
@jarko: I have several spreadsheets I use with different versions of LibreOffice. If I open them with an old version, I see correct values. Using a current version, I see wrong values. I use both versions to change the spreadsheets. It does not matter from which version the spreadsheet comes from: A current version displays the wrong values and I see the right results when I open it with an old version.
This is consistent with my assumption that the rounding mechanism has been changed such that it may display unexpected values even cases where the value to display is the best possible approximation of the exact value. If this value happens to be less than the exact value the round down mechanism will round it to the next rounding point below. A user will experience a kind of pseudo random behaviour showing about 50% correct and 50% wrong values. I don’t accept the wall clock time argument here as the time difference in question is so small that I never ever will notice this difference on any wall clock.
@Mike Kaganski: I fear explaining why I disagree with your short statement "This is totally wrong" requires some lengthy explanation and that we are discussing how to reinvent the wheel. There is a de facto industry standard that is decades old and works in practice. It works despite the fact that it uses floating point numbers which in general cannot represent time values exactly. This standard has a lot of deficiencies but I think there is no room for substantial improvement. Any change will break a lot of spreadsheets that rely on the very features of this standard, e.g. the way rounding is done. So we can only document how we understand the features of the standard but should not do any substantial change.
Assume there is some user who wrote a spreadsheet some years ago. The results should be in a format what you now call DURATION format but the standard only offered what you now call CLOCK format. Knowing the rounding trick the user will add 0.5 of the relevant unit before a value is displayed and will get the expected result.
Would you do this user a favour if you change the rounding behaviour to the DURATION rounding this person actually wants? No: After the change the spreadsheet feeds the wrong values into the rounding mechanism and the user will see wrong values and will probably spend a lot of time to analyse and to fix the problem. Or worse: will not notice the problem and run into trouble due to the use of this wrong information. How to handle a situation where some users want to exchange spreadsheets but use different versions of LibreOffice? Should they write macros that check the version of LibreOffice and change the contents of the cells according to the rounding that the current version offers?
Having to manipulate numbers to prepare them for display is a common problem that is not limited to the formatting of time values: The available formatting mechanisms may not provide the format you need. Therefore you manipulate the values before you display them. You may multiply a number by 100 if you don’t have a percent format. In other situations you may use functions like CEILING, FLOOR and ROUND and a couple of others. For time calculations you might think of using functions such as SECOND, MINUTE and the like. But I think their semantics is so ill defined/documented that I don’t use them.
There may be a conceptual difference between duration and wall clock time. But I think it is both a strength and a weakness of this industry standard that it does not distinguish between these two concepts: Anything is a float and it is up to you if you interpret this as a duration or as a point in time.
In my previous post I forgot to mention a few assumptions that I think should be documented in the help information, provided I am correct: The time representation is based on the assumptions that one day is 24 hours, one hour is 60 minutes and one minute is 60 seconds. If I am not - the correct information should be documented.
Due to these assumptions there is no need to distinguish between wall clock time and duration. Unfortunately these basic assumptions are wrong - that’s why they should be documented. There are days that are 25 hours, 23 hours or 24 hours + 1 second long, and there are minutes with a length of 61 seconds. This happens on days when there is a switch between daylight saving time and standard time or on days containing leap seconds. Now the distinction between points in time and durations becomes relevant: It is not more the same if you add one day or 24 hours.
Just a few hints how this kind of problems is solved elsewhere: I think Java has what I would consider three generations of date and time systems. They started with a java.util.Date class and some related classes such as java.sql.Date. I think the basic concept is roughly the same as the date and time system used in LibreOffice with the exception that they base their representation on integral instead of floating point values. This way they avoid the rounding issues inherent to a floating point representation.
They quickly discovered that this concept cannot support many time and date features. So they developed a second generation of classes such as java.util.Calendar and java.util.GregorianCalendar. These classes allow date and time calculations that support time zones and daylight saving times. The definitions of times zones and daylight saving times may change due to political events. Therefore these classes need some kind of database and there is a need to keep it current.
The package java.time contains what I consider a third generation of date and time classes. Here they have classes such as java.time.Duration or java.time.Instant which implement the distinction between durations and points in time.
You may locate and study the javadocs of these and related classes and packages to get an impression what is ahead of you if you really intend to implement correct date and time calculations at a high level of precision. Note: They seem not to change the semantics of existing classes as this would break existing programs. They implemented new classes and declared old features deprecated.
I always wonder, why people complicate simple things, e.g. using floating point arithmetic for values, which are natural by default. This is simply "breaking" good data on your own. But even so, why on earth multiply beings, instead follow Ockham's razor?
In the considered topic in my opinion there is no need to maintain two time formats, one for duration, and the other for time spot - which easy leads to confuse them and make a risk of "improper use". Not saying about compatibility across spreadsheets. Just to simply assume that when there is simple, single date value in cell we compute it traditionally, i.e. like (let it be) time fraction, but when some kind of performing action takes place (subtracting, adding, multiplying, etc.) - rounding the result in the [HH]:MM flavor. All this with single format "HH:MM:SS".
@Jaromir: As the de facto standard is quite old the floating point approach may have some historical roots. Maybe there were a calendar first which used days as basic unit. Later there was a need to add time but it was not possible to change the basic unit. This might have broken too much of existing software. So time was added as a fraction of days.
I recently discovered another bunch of bug reports related to date and time arithmetic and found Winfried Donkers’ attachment #147433 [details] (datetime parts comparison) in Bug 118800 comment 14 (discussed also in Bug 121978 comment 10 and the following comment) which compares how different programs handle date and time functions as well as formatting. From this attachment I learned an additional variant of how LibreOffice might interpret date and time handling. But I also saw that Excel 2016 seems to handle date and time values in the same way as Excel 97 did about 20 years ago (see Bug 127476 comment 6). So why there was no need to change Microsofts date and time handling? Why LibreOffice needs so many changes that cause a lot of unnecessary work and irritation both at the user and developer side?
My opinion is that Microsoft pretty early had a clear concept of how an intuitive date and time arithmetic should work. In contrast, LibreOffice development seems to work on a one-bug-at-a-time approach that is based on an inconsistent specification that implies counter-intuitive behaviour. This approach is doomed: You may have a clever idea how to fix one bug but this creates another problem at some different location.
So I see a need to have a clear description that explicitly states the key features of how LibreOffice’s date and time arithmetic should work. Therefore I try to reverse engineer the key features of the de facto standard:
1) A value of 1 corresponds to one day.
2) One day is 24 hours of equal length
3) One hour is 60 minutes of equal length
4) One minute is 60 seconds of equal length
5) There are no exceptions to rules 1) … 4) as long as you stay in the defined calendar range, which is something from ??? to 31.12.9999 (?) (this should be specified)
6) Points in time and durations are represented by floating point values. (of sufficient precision; it is necessary to specify the details to allow the user to assess the size of the errors that will occur, I assume are double values are used).
7) Date and time calculations are done using the features of the underlying floating point arithmetic only. This floating point representation cannot in general represent the date and time values exactly. Due to the properties of floating point arithmetic and the inexact representation the user should expect errors both in representation and in the calculations. (Note: This is a common problem inherent to any floating point calculation.) The size of the errors should be expected to vary by several orders of magnitude depending on the kind of calculations that is done.
8)The standard uses an error recovery mechanism that tries to recover the exact values from their inexact representation. This mechanism is used in all date and time functions as well as for formatting. It cannot guarantee to recover correct values but it has to return consistent values. Example: The recovery may consider a value of 59.5 seconds as a representation of 59 or of 60 seconds. The SECOND and MINUTE functions should return either 0 minutes and 59 second or 1 minute 0 seconds, but not 0 minutes and 0 seconds. See bug 127476 for an example where this kind of problems affects which millennium you get.
9) There is a reference point in time that is represented by the floating point value zero. (I think there are several possible reference points that are already specified in the help information)
10) The standard does not support a conceptual distinction between points in time and durations. Points in time are defined by the time difference between this point and the reference point, i.e. by a duration. As long as the resulting values stay within the supported range durations always can be represented by points in time and vice versa.
11) Formatting of time values uses a "round down" strategy, for example 10 minutes and 55 seconds rounded to minutes are 10 minutes. Open questions: What kind of rounding is used in the fractional part of seconds, such as 3:10:55.954? Does the error recovery mechanism use different time units that depend on the number of digits in the fractional part?
12) Realisation of the error recovery mechanism: In order to reconstruct an exact date and time value from an inexact representation some basic unit of time is used. This is an integer fraction of a second, e.g. a second or a millisecond (the exact value has to be defined and documented, maybe the existing industry standard leaves no choice what value to use). Using a "rounding to the nearest" strategy the time value is mapped to some multiple of this basic unit. All further calculations are based on the resulting integer value. Based on the recovered exact value all date and time function can be implemented in a way such that they return consistent values.
I think the "Recalculated Formula (OpenFormula) Format" specification of the time and date mechanism is broken as it does not consider any kind of error recovery as described in #8 and #12. Error recovery is necessary to avoid issues caused by the combination of the inherent inexact floating point representation and "round down" strategies required for formatting. These issues are documented in a couple of bug reports. This specification also disregards that the users expect to see consistent results, such as 1+1=2.
As a consequence of #10 and #11 no DURATION kind of rounding is possible. The distinction between CLOCK and DURATION rounding seems not to solve problems of incorrect math anyway. Bug 131151 seems to provide an example that produces wrong results despite a correct use of CLOCK formatting.
Maybe I understand what motivates the introduction of the rounding to the middle alias DURATION rounding. Points #7, #8 and #12 imply that you may have to sacrifice several orders of magnitude of the precision the underlying floating point representation provides. Using a "rounding to the next" strategy should avoid the need of an error recovery mechanism and thus allow to use the full precision of the underlying representation.
So there is a point to have this kind of rounding. Unfortunately it is incompatible with the requirements of the established industry standard despite the fact that in many cases it will make no or little difference which kind of rounding you use. Therefore offering a DURATION kind of rounding may be useful, but only if it is offered as an additional feature but not, if it is tied to the use of [HH] or HH. A description is necessary that informs clearly about the differences between the different rounding schemes. Thus some mechanism should be offered that allows to select the DURATION kind of rounding no matter if the [HH] or HH format is used. This has the further advantage that the formatting mechanism allows more variations, e.g. combining HH formatting with DURATION type rounding. In addition all date and time functions such as SECOND, MINUTE etc. need additional parameters that provide them with the information that is necessary to return values that are compatible with the formatting. If you consider the common decimal notation for floating point number you will find a similar situation: The FLOOR, CEILING and ROUND functions all have additional parameters.
Maybe it is possible to combine error recovery and long fractional parts. Idea: The calculation of the fractional part could be used to calculate the exact second as recovered value. Thus this value is consistent with the rest of the fractional part. The calculation of the other time units will not introduce inconsistencies as they are based on an exact recovered value.
In general I am somewhat sceptical about trying to enhance this old standard. I think it is a pretty simple approximation of date and time calculations which is quite useful if you can live with its limitations. If you try to add features you may run quickly into the monsters Mike Kaganski mentioned in bug 132083 comment 13 and into a lot of weird situations. Increasing time resolution may suffice. An example: Assume a current version of LibreOffice that has a slightly more sophisticated date and time arithmetic. You can ask it to calculate the number of days or hours between the first day of January 2020 0:00 and the first day of January 2025 0:00. This arithmetic returns a correct number of days but an error value for the number of hours. Why?
@Albrecht - very, very complicated. I am convinced there is big time to change internal representation from floats to integers.
From now we already solve two problems: accuracy and "improper" formats. We gain 100% precision and consistent presentation of date/time values in any case.
What remains to be done? Compatibility with older spreadsheets. I think this could be solved by preparing wrappers - functions providing "bridge" between older and a new system. The only problem would be if someone hardcoded data using constants etc.
A key point of the error recovery mechanism is to do most of the internal calculations on integer values. It may be surprising simple:
It uses two parameters:
rv: This is the raw floating point value that is feed into some date and time function or into the formatter.
n: This is the number of digits in the fractional part of the seconds representation. This value is zero if used for date and time functions such as SECOND, MINUTE etc.
At first we calculate another floating point value that prepares the rest of the calculations. It is essentially the value rv expressed as multiples of seconds instead of days and rounded to the appropriate number of decimal digits. Assuming we have a function round(v,m) that rounds v to m decimal digits after the decimal separator the prepared value pv is:
pv = round (rv * 24 * 60 * 60, n)
We use this value to get the recovered number of seconds rs. This is simply the greatest integer that is less than or equal to pv. Thus assuming there is a function floor that achieves this:
rs = floor (pv)
We also calculate a fractional part fp. This is a floating point value with the property 0 <= fp < 1. This value is used to format the fractional part of the seconds:
fp = pv - rs
If you use rs as the basis for all date and time functions (including such functions that calculate the year, month etc, doing integer math only) then the results should be consistent and exact.
Due to the rounding the results should be pretty robust against round off errors. The round off errors usually will be well below the milliseconds range. So you have to accumulate a lot of them to get the second wrong. I think that is how Excel used to work for more than 20 years and old versions of LibreOffice did. Unfortunately this does not conform with the specification which is the basis of the current LibreOffice implementation. According to this specification round off errors that I guess are in the range of 0.0000001 seconds decide which minute you see in your spreadsheet. The developers have no chance to implement something compliant which shows reasonable behaviour.
I got a new insight with respect to the CLOCK vs. DURATION formatting debate. Such a distinction is necessary in the context of negative values, and here it is reasonable to tie this distinction to the use of [HH] or HH. But it is irrelevant in the context of the arithmetic errors where it was used.
Example: Take the value -1/32. This corresponds to -45 minutes and it should have an exact floating point representation. Formatting this as HH:MM gives "23:15" while [HH]:MM gives "-00:45". Both results make sense. The first format interprets the value as "1899-12-29 23:15:00" and returns the hours and minutes. The second format interprets the very same value a 45 minutes that go back in time. OK: "1899-12-29 23:15:00" is actually 45 minutes before the reference point "1899-12-30 00:00:00".
I think the cause of your problem in bug 132083 is that due to the missing error recovery mechanism tiny round off errors will surface as arithmetic errors. I also saw an example ( Bug 131151) that shows the same kind of errors at a place where the use of CLOCK rounding is semantically correct. In a common use case where the time values are between 0:00:00 and 23:59:59 it should not matter if you use [HH] or HH.
So I think there is no need to change the internal representation. What is necessary is to develop a clear concept of how the date and time arithmetic is supposed to work, get the specification right and change help information and implementation accordingly.
In my opinion -45 minutes should be always treated as "-00:45" (that is [HH]:MM). Involving reference points makes a mess: you always have to remember
Yes, it is possible to maintain double (float/integer) operations for each time/date value. But for the price of unnecessary overloading. Let's use it only in very situations: when backward compatibility is required.
(In reply to Jaromir / email@example.com from comment #14)
> In my opinion -45 minutes should be always treated as "-00:45" (that is
How would you then represent something like "1899-12-29 23:15:00" or - a more common case - like "2020-04-23 19:33:05"?
> Involving reference points makes a mess: you always have to
> remember your "startpoint".
This was never a problem for me: Not I have to remember the reference point but the spreadsheet program has to know where it is. But I understand that this may cause trouble as there are different conventions. How would you define points in time without some reference point?
> Yes, it is possible to maintain double (float/integer) operations for each
> time/date value. But for the price of unnecessary overloading.
I don't understand this argument: Depending on the context time may be seen as continuous process or a sequence of countable ticks. So both a float and an integer representation may be adequate. In the context of date and time calculations the error recovery mechanism is not very complicated.
> Let's use it only in very situations: when backward compatibility is required.
I consider this a pretty important use case. If written on paper you can read calculations that are hundreds of years old. How long should it be possible to access contents of spreadsheets?
(In reply to Albrecht Müller from comment #15)
> (In reply to Jaromir / firstname.lastname@example.org from comment #14)
> > In my opinion -45 minutes should be always treated as "-00:45" (that is
> > [HH]:MM).
> How would you then represent something like "1899-12-29 23:15:00" or - a
> more common case - like "2020-04-23 19:33:05"?
By setting zero point to "-9999.01.01 00:00:00" (or "-99999999.01.01....." - e.g. for professionals wanted "extended" calendar). Anyway, internally use only positive values.
> > Involving reference points makes a mess: you always have to
> > remember your "startpoint".
> This was never a problem for me: Not I have to remember the reference point
> but the spreadsheet program has to know where it is. But I understand that
> this may cause trouble as there are different conventions. How would you
> define points in time without some reference point?
See above. Universal reference point.
> > Yes, it is possible to maintain double (float/integer) operations for each
> > time/date value. But for the price of unnecessary overloading.
> I don't understand this argument: Depending on the context time may be seen
> as continuous process or a sequence of countable ticks. So both a float and
> an integer representation may be adequate. In the context of date and time
> calculations the error recovery mechanism is not very complicated.
Everything is possible. But - according to notes of Mr. Mike Kaganski - the speed is crucial for modern spreadsheets. That's why my objection.
> > Let's use it only in very situations: when backward compatibility is required.
> I consider this a pretty important use case. If written on paper you can
> read calculations that are hundreds of years old. How long should it be
> possible to access contents of spreadsheets?
I divide users into roughly two groups: those, who would maintain their calculations made many years ago, and those, who just start work with new data. The first group needs extra math processing, but the latter is free of that charge.
Created attachment 160356 [details]
Trying to clarify clock vs duration time
Having seen a lot of bugs related to LibreOffice’s date and time arithmetic I think many of them have a common cause: Major defects in the document that specifies how this feature should work. According to my analysis it is not possible to implement this specification and there is no easy fix. Therefore I tried describe how various properties of the date and time arithmetic depend on each other, and explain an idea that should allow to implement a date and time arithmetic that reliably gets simple calculations right.
The analysis in attachment 160356 [details] suggests a workaround that allows LibreOffice to display correct results for points in time. I don’t know if this workaround works for durations as a different rounding mechanism may be used.
The workaround is based the assumption that the date and time functions (including formatting) will deliver correct values for a subset of their domain. The idea is to map any values to that subset where the functions behave correctly.
Assume that t is some floating point number that is used to represent some point in time. Then this value is mapped to (round(t*86400) + 0.1)/86400
This is expected to have the following effects:
a) The situation where the behaviour of the SECOND function is undefined cannot occur. The round function which is called within SECOND function should get values such as 0.1, 1.1, … 59.1 which it will round to 0, 1, … 59. A value such as 59.7 which should be rounded to 60 cannot occur. Thus this problem should be gone.
b) The round down mechanism used in the other date and time functions will always get a value that is a fraction of a second above the correct value. Therefore rounding down will round to this correct value. This should solve the problem that the round down mechanism may produce results that are perceived as arithmetic errors.
Remarks: The mapped value contains an intentional error and therefore should not be used for further calculations! The workaround cannot easily extended to respect fractional parts as the SECOND function is specified to round to the nearest second.
First, you may have overlooked that this now since some time (last year) is a Documentation request to clarify the current implementation and difference of wall clock / timestamp formats and duration formats. Talking about other means of how date+time could or could not be implemented in this bug thus is not helpful at all as it just confuses everyone, especially not so technically inclined, even more who actually wanted to write a documentation / help section. Your previous overly lengthy comments from April probably already scared everyone away.
(In reply to Albrecht Müller from comment #18)
> Assume that t is some floating point number that is used to represent some
> point in time. Then this value is mapped to (round(t*86400) + 0.1)/86400
You don't gain anything by that, it would just introduce errors at other places. For examples: (output using YYYY-MM-DD HH:MM:SS.00 wall clock format that does not round)
Date serial number: 43863
Your calculation applied: 43863.0000011574
Formatted output: 2020-02-02 00:00:00.10
So far, almost good, *if* ignoring the fractional second *if* no such number format or calculation asking for it is involved.
Input: 2020-02-02 02:02:02.99
Date serial number: 43863.0847568287
Your calculation applied: 43863.0847581019
Formatted output: 2020-02-02 02:02:03.10
=> bad second value and bad 100th value.
Input: 2020-02-02 02:02:59.99
Date serial number: 43863.0854165509
Your calculation applied: 43863.0854178241
Formatted output: 2020-02-02 02:03:00.10
=> bad minute value and bad second value and bad 100th value.
... and so on.
The last sentence in the description of my workaround: The workaround cannot easily extended to respect fractional parts as the SECOND function is specified to round to the nearest second.
Thus this workaround is intended for cases where you deal with whole numbers of seconds that may contain round-off errors. This seems to be a pretty common case. What I expect to gain is to get consistent results from the date and time functions such as SECOND, MINUTE etc. This workaround should also fix situation where you see calculations such as 18 minutes - 8 minutes = 9 minutes without the need to care about the wall clock vs. duration distinction (see bug 132083). The workaround is just a by-product of the analysis I tried in attachment 160356 [details] (Trying to clarify clock vs duration time).
In my opinion date and time functions are closely related to formatting. Assume you format some date and time value with no fractional part at the seconds position. Then the value at the seconds position should be exactly the value returned by the SECOND function for the same value. This should be true for the other functions such as MINUTE, HOUR, DAY, MONTH and YEAR too.
In this context (ignoring the fractions of seconds) your examples show the correct behaviour:
2020-02-02 02:02:02.99 rounded to the nearest second is 2020-02-02 02:02:03
2020-02-02 02:02:59.99 rounded to the nearest second is 2020-02-02 02:03:00
A few words to the results of the analysis in the attachment:
The specification of the date and time functions contains serious flaws. Therefore I think it is impossible to implement a reasonable date and time arithmetic that is based on this specification (see also bug 127476 comment 10). If you try you will need a scaring long help information explaining a lot of unexpected behaviour.
Using different rounding strategies for wall clock time and durations may solve problems of calculation such as
wall clock time - wall clock time → duration
wall clock time + duration → wall clock time
as in the latter case the round down mechanism for wall clock time will transform tiny round-off errors into clearly visible arithmetic errors. (See bug 131151, I could not try my workaround in this case as the ROUND function seems not to work in text tables)
There is a rounding strategy that should work in the same way for wall clock time as well as for durations:
1. Multiply the date and time value by the number or seconds in a day, i.e. 86400.
2. Round this value to the required number of digits in the fractional part of the second. This step provides the best possible approximation and makes the calculation stable against round-off errors.
3. Separate the number into an integer and a fractional part. The fractional part is used to format the fractional part in the seconds value. The integer part is used to calculate all date and time information such as seconds, minutes, … year. As these calculations are based on exact integer values there are no rounding problems in this part.
Note: You may get different integer values depending on the number of digits in the fractional part. As a consequence _all_ date and time functions need a parameter that tells them this number of digits. This parameter should default to zero.
(In reply to Eike Rathke from comment #19)
Comment 18 shows just how incompetent Albrecht Müller is, all that wall of text being incorrect attempt to standardize on 1/10 of a second precision. And all the distraction he caused in his endless comments resulted in that, being a poor variation of the approach I suggested elsewhere to standardize on millisecond precision for displaying times, i.e. round times to nearest millisecond (as opposed to durations shown rounded to last shown digit), allowing to get incorrect s/m/h/d/m/y on that boundary, and accepting it as the trade-off. With that aggressive graphomany poured to others in multiple time-related issues, he only consistently makes fixing everything harder by hiding the essense (e.g., what needs documenting here) and scaring people away. I suggest hiding his comments using spam tag.
(In reply to Mike Kaganski from comment #21)
> on 1/10 of a second precision
A thinko: on whole seconds, of course.
(In reply to Mike Kaganski from comment #21)
> the approach I suggested elsewhere to standardize
> on millisecond precision for displaying times
I think you refer to Bug 127334 comment 8 where you propose that some kind of rounding is necessary. In attachment 160356 [details] I needed a few pages to describe how Excel might do this "some kind of rounding". Based on this I think:
• The user can choose whatever precision he or she considers reasonable. It is not necessary do standardize to some precision at all, neither seconds, milliseconds or whatever.
• There is no need to use different rounding strategies for points in time (wall clock time) or durations.
• The specification describes precisely the rounding method that has to be applied. Thus your approach as well as my proposal are incompatible with it.
> incorrect attempt to standardize on 1/10 of a second precision.
There are users who expect correct handling integer arithmetic when using integer multiples of time units without having to worry about roundoff errors or distinctions between wall clock times or durations. When using the SECOND and MINUTE functions other users may expect that a carry goes from seconds to minutes when seconds are rounded up to the next minute. The workaround is intended to make those persons happy. The trick is to inject a carefully crafted error into the input of the current implementation which makes it return the expected results. Its a kind of proof of concept for the ideas in the attachment. I chose rounding to seconds due to this specific use case. It is not a standardization attempt as I think standardization is unnecessary.
> the essense (e.g., what needs documenting here)
If it is unclear if the date and time functions should follow the specification or should do some kind of rounding that deviates from this specification then it is pretty difficult to decide what needs documenting.
Just one example how this works: If the date and time functions handle the carry from seconds to minutes it is not necessary to document this. People would expect this anyway.
But there is a decision that Calc will will not do this: Bug 127476 had been closed using the WONTFIX tag. In this case a lot of things need documentation:
• The fact as such: The implementation will lose a carry sometimes.
• What are the situations when this happens?
• From a user’s point of view: What are the benefits of Calc’s way of handling the carry?
• How date and time calculations are affected? The user should know if the value is wrong or the formatting.
• Issues with compatibility to Excel, legacy spreadsheets, StarBasic’s date and time functions, … I stop here to avoid TLDR effects.
Created attachment 165304 [details]
Trying to clarify clock vs duration time, version 0.0.1
Example in section "Rounding seconds" was broken.
(In reply to Albrecht Müller from comment #24)
> Remark: It has been claimed (comment 1 on bug 132083) that the arithmetic
> errors in the date and time arithmetic are due to a wrong use of formatting.
OMG. Is that a deliberate misinterpretation?
(In reply to Albrecht Müller from comment #24)
The document is interesting, thanks (I hope the misinterpretation mentioned in comment 2 is not intentional). However, it seems to either accidentally mix two things, or tries to do that on purpose: it doesn't discuss the problem as formatting problem, but rather as computational problem when performing spreadsheet operations with date/time values.
It should be stated upfront, that this is not a useful approach. I don't think it's possible that Calc starts processing "Date/Time" cells/values somehow differently when performing calculations. The only useful discussion is about how the results of normal floating-point calculations, using the unit of day, without any intermediate conversion into any other kind of unit or data type like integral number of small units, and without rounding (or different post-processing) of the stored result, be represented on display. Summing, multiplying, subtracting durations and/or times should continue to be the same summation/multiplication/subtraction of underlying doubles. Which means that the end result would still be in general off by some value...
But it's the display (date/time format) that we need to change. And so this is the question about the final calculations when the result is converted to the date/time format, when we should settle to some kind of rounding to some fraction of time. So I refer to Bug 127334 comment 8, and this time, I suggest to follow the Excel's sane decision:
1. To avoid problem with SECOND/MINUTE/HOUR/... that you rightfully mentioned in the attachment 165304 [details], given that there's no function giving the time parts smaller than seconds, all the mentioned functions must round to nearest integral seconds, and then calculate the resulting value of each of them. So that for any given date/time value, the whole set of YEAR(val), MONTH(val), DAY(val), HOUR(val), MINUTE(val), and SECOND(val) would form a reasonable set of data, without a possibility of error of almost (-1 min).
2. Then, since these functions are reasonably expected to give the same data as shown using "YYYY-MM-DD HH:MM:SS", the date/time format that does not include fractions of seconds should be processed following the same rules on display: rounding to nearest integral seconds, then calculating all the format components from that value.
3. Only when displaying the date/time formats including fractions of seconds, use some smaller fraction to round to. Please let's settle on fixed 1/1000 of a second - it has reasonable robustness to allow tens (hundreds) of arithmetic operations before floating-point operation errors might affect the display value, and it consistent with other spreadsheet software.
4. Document all this.
The misinterpretation may come from taking a users point of view: In the bug report Jaromir essentially subtracted 8 minutes from 18 minutes, expected 10 minutes, saw 9 minutes. Clearly not consistent with elementary integer arithmetic. Then he learns that he will see the correct 10 minutes if he uses (right) duration format instead of (wrong) wall clock format. No intention to hurt anyone. Sorry.
> it doesn't discuss the problem as
> formatting problem, but rather as computational problem when
> performing spreadsheet operations with date/time values.
Seems that I did not distinguish between the two things clearly enough: The computational problem is that Calc represents date and time values as floating point values only and processes these values in the same way as any other floating point values. So I have no problem with a calculation like "1900-01-01 * 1900-01-02 = 1900-01-05" as this is - using a different number format and the right date base - the same as 2 * 3 = 6.
Being able to use floating point arithmetic for performing date and time calculations is a key design feature which you cannot take for granted. That's why I listed the seemingly trivial restrictions that make feature possible.
Of course, these floating point values in general cannot represent date and time values exactly. Therefore you need some way to deal with these inaccuracies in some reasonable way. This problem appears both as a formatting problem as well as a computational problem in date and time functions like SECOND/MINUTE etc.
ad 1) Don't forget that besides SECOND, MINUTE, HOUR, DAY, MONTH and YEAR there are some more date and time functions that also should use the same rounding strategy. Otherwise they may return inconsistent results.
ad 2) Agree
ad 3) I don't understand why you want to settle on a fixed 1/1000 of a second. The math I tried to develop does not distinguish between representations where seconds have or don't have a fractional part. Not having a fractional part is equivalent to having a fractional part containing zero digits. Therefore my idea is to have a set of elementary functions that use the number of digits in the fractional part as a parameter. This way you need only one implementation which is the basis of all the other functionality. In 1) you argue that the SECOND/MINUTE etc. functions should always round to the nearest second. This would be equivalent to use a parameter that specifies zero digits in the fractional part. You can use these functions for formatting too. In this case the parameter is the number of digits required by the format, which may be zero (no fractional part) or more (if there is a fractional part). I think this results in a behaviour which is equivalent to the common representation of floating point numbers where the same number may be represented by "2", "1.9", "1.94" or "1.936" depending on the number of digits required. If there is a need to use a fixed resolution of 1/1000 of a second simply use 3 for this parameter.
ad 4) That's probably a lot of work.