Bug 127477 - Incomplete description of date & time functions in the help information
Summary: Incomplete description of date & time functions in the help information
Status: REOPENED
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: Documentation (show other bugs)
Version:
(earliest affected)
6.0.4.2 release
Hardware: All All
: medium normal
Assignee: Not Assigned
URL:
Whiteboard: target:6.5.0
Keywords:
Depends on: 127170
Blocks: Help-Changes-Features
  Show dependency treegraph
 
Reported: 2019-09-10 14:37 UTC by Albrecht Müller
Modified: 2020-11-18 22:12 UTC (History)
5 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Albrecht Müller 2019-09-10 14:37:07 UTC
Description: A well established de-facto standard seems to exist which specifies how spreadsheet application do date & time calculations. This standard seems to be more than 20 years old. It seems to include the convention to round floating point arguments to the nearest second before extracting date or time related information such as year, month, day, hour etc. This usually avoids the need to care about round-off errors or to distinguish between points in time and durations. The help information does not mention this convention which may be the source of a serious regression error (bug 127334) and inconsistent behaviour (bug 127476) and various other bugs: I discovered Bug 18800, and Bug 121978 (this one contains links to additional problems in the „see also“ part) and bug 122158. The key problem: Not knowing what the intended behaviour is makes it impossible to distinguish between bugs and correct behaviour. So you cannot fix related bugs.

How to reproduce: Look up the description of the functions like YEAR (https://help.libreoffice.org/6.3/en-US/text/scalc/01/func_year.html), MONTH, DAY, HOUR, MINUTE and SECOND (https://help.libreoffice.org/6.3/en-US/text/scalc/01/func_second.html). The section „Date & Time Function“ (https://help.libreoffice.org/6.3/en-US/text/scalc/01/04060102.html ) contains a list with links to those functions. There may be some more functions not mentioned here that are affected by this problem.

Actual behaviour: You will see statements like „Returns the year as a number according to the internal calculation rules.“ or „Returns the second for the given time value. The second is given as an integer between 0 and 59.“ The descriptions says nothing about the way rounding is is intended to be done which is important as the arguments of these functions are float values which may contain round-off errors. So the description leaves open what YEAR("1999-12-31 23:25:59,6") should return. If you round to the next second the result should be 2000. If you use some kind of wall clock rounding you get 1999 and are in the previous millennium. Of course the results of various applications differ: More than 20 year old MS Excel97 returns 2000 while LibreOffice returns 1999 (observed with versions VS. 6.2.6.2 and 6.0.4.2)

Expected behaviour: The help information should clearly and completely describe the intended semantics of these functions. Preferably this description is consistent with itself and with the de-facto standard. If the help function refers to „results based on international standards“ it should name these standards and give at least some hints where to find them.
Comment 1 Albrecht Müller 2019-09-11 18:03:05 UTC
Here are some ideas how the behaviour of the various date & time functions could be specified.

The first thing is to accept that you cannot know if some floating point number is intended to represent some point in time or a some duration. Thus there should be no attempt do distinguish between those concepts. Date & time functions deal with just another representation of floating point numbers. So I have no problem accepting that (assuming a proper value for the date base) 1900-1-1 times 1900-1-2 gives 1900-1-5 as this - using another representation - just states that 2 * 3 = 6. If you know what you are doing this is not a problem.  They start if you don't because the help information does not tell you what you need to know. 

You can easily see that the source of many bug reports are probably some general problems of floating point number representation that are not specific to data & time calculations.

Consider the common decimal notation. Instead of date & time functions like SECOND, MINUTE etc. we define a function DIGIT(x, n) which is intended to return an integer between 0 and 9 which is the nth digit of floating point value x. Let x=314.2 then you would expect the following results:

DIGIT(x,2) = 3
DIGIT(x,1) = 1
DIGIT(x,0) = 4
DIGIT(x,-1) = 2

You easily get fooled into believing you can implement this function. 

The problem is that you might not even be aware that any such implementation is based on assumptions about the way what the "right" representation of this number is. This becomes clear if you add parameters that specify the underlying representation. You can for example add two parameters so you have a function DIGIT(x,n,m,r). The meaning of x and n is the same as before. m specifies the position of the least significant digit of the representation and r specifies the rounding method applied to calculate this digit. Values could be N, D and U to specify rounding to the nearest value, rounding down or up respectively.

Just a few examples for the effect of these parameters:

DIGIT(x,2,2,D) = DIGIT(x,2,2,N) =  DIGIT(x,2,1,N) = DIGIT(x,2,1,U) = DIGIT(x,2,1,D) = 3 
as this corresponds to the first digit of 300, 300, 310, 320 and 310 respectively.

DIGIT(x,2,2,U) = 4 as rounding up 314,2 up at the most significant position gives 400.

If you assume that m is less than -2 then the exact values of these additional parameters don't matter. You always get the same results. It is this very property which makes things difficult as you might assume that these parameters don't matter if you are precise enough. Unfortunately you are right in most cases so it may take a long time until you stumble on values like 0.999999999999993 where even the most significant digit depends on the position of the least significant digit and what kind of rounding you applied to get this digit.

On the other hand this can become a common problem if you do mathematical operations with floating point numbers that don't have an exact representation and you apply D or U rounding. Under these circumstances values like 0.999999999999993 and 1.000000000000007 are quite common and you will encounter a kind of pseudo random behaviour of your calculations. This behaviour is caused by minimal round-off errors. Add for example 0 and 1, which actually are the these two numbers rounded down, you will get 2, but 1 if the second number actually is 1.000000000000006.

I think it does not require additional explanation why you may get funny results if you calculate the digits of some number with different values for m and r and use these digits to compose a representation of this number.

I hope that this analysis of the DIGIT function helps to understand the common root of many problems that plague date & time functions. There is no "right" precision or rounding behaviour that fits all floating point calculations and that's true for date & time calculations too. Therefore the date & time functions should use some information that plays the role of the m and r parameters of this DIGIT function. Of course it should be documented what this information is and how to control the relevant values to give both developers and users a chance to know what they are doing.
Comment 2 Olivier Hallot 2019-12-27 19:55:47 UTC
Trying to sumarize.

1) The YEAR, MONTH... help pages are short in description and navigation between similar topics. Help does not help... -> OK to improve.

2) YEAR("1999-12-31 23:25:59,6") is not a valid argument for the function, but 
YEAR(DATEVALUE("1999-12-31 23:25:59,6")) is. Besides we are 35 minutes before the hour in your example and it can't be the reason to round the year. Assuming it is a typo...

=YEAR(DATEVALUE("1999-12-31 23:59:59.99999")) is 1999
=YEAR(DATEVALUE("1999-12-31 23:59:59.9999999")) is 2000

Versão: 6.3.4.2
ID de compilação: 60da17e045e08f1793c57c00ba83cdfce946d0aa
Threads da CPU:8; SO:Linux 5.3; Realizador da interface: GL; VCL: kde5; 
Local: en-US (pt_BR.UTF-8); Idioma de IU: pt-BR
Calc: threaded

3) We suppose the function(s) are compliant with the ISO-IEC 23600 standard (aka Open Document Format from OASYS), therefore the Help pages contents can be based on (e.g. YEAR)
http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1018216_715980110

If they are not compliant, we must bring to the development for fixing.

4) as result of 3) we either flag the difference with MSExcel 97 or we just ignore Excel 97 as not a standard in which we rely. Many Calc functions were introduced to accommodate the "de-facto" excel standard and deviates from ODFF standard.

5) If the standard ISO-IEC 23600  is not accurate on how rounding error are handled... -> development.

However I doubt the samples above can be considered a bug.
Comment 3 Albrecht Müller 2019-12-28 11:22:16 UTC
(In reply to Olivier Hallot from comment #2)
> 
> 2) YEAR("1999-12-31 23:25:59,6") is not a valid argument for the function,
> but 
> YEAR(DATEVALUE("1999-12-31 23:25:59,6")) is. Besides we are 35 minutes
> before the hour in your example and it can't be the reason to round the
> year. Assuming it is a typo...

YEAR("1999-12-31 23:25:59,6") should have been YEAR("1999-12-31 23:59:59,6"). Sorry for this typo. This syntax works but it requires to set a switch in the Calc options and the "de" locale.

> 
> =YEAR(DATEVALUE("1999-12-31 23:59:59.99999")) is 1999
> =YEAR(DATEVALUE("1999-12-31 23:59:59.9999999")) is 2000

I wanted to point to different thing: I think there is a bug in the specification:
The specification of the SECOND function (http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1018202_715980110) states: "Note also that this rounds to the nearest second, instead of returning the integer part of the seconds". There is a problem in the last half second of a minute. If you round this value, e.g. 59.6 seconds to to nearest second, you will get 60 but the SECOND function is specified to return integer values in the range 0 ... 59. Therefore something is wrong with this specification: If SECOND return 60 this violates the requirement that the return value is in the range 0...59. To ensure that the result is in the proper range you could take the remainder of a division by 60. The specification does not contain this operation but I think that this is tacitly assumed. Thus SECOND returns 0 in this case.

The MINUTE function (http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1018194_715980110) does not round to the next second. Thus a value corresponding to 59.6 seconds is 0 minutes. Therefore 59.6 seconds corresponds to zero minutes and zero seconds.

As all the other functions (YEAR, MONTH, DAY, HOUR, etc.) also do not round to the nearest second it is possible to create this example where the problem propagates up to the digit in the millennium position: So according to the specification a value of "1999-12-31 23:59:59.6" should result in:
YEAR 1999, MONTH 12, DAY 31, HOUR 23, MINUTE 59, SECOND 0. A consistent value would be YEAR 2000, MONTH 1, DAY 1, HOUR 0, MINUTE 0, SECOND 0. 

This example is the content of bug 127476 which was intended to demonstrate of what I called "funny results" in comment  #1. In short: In order to avoid this kind of inconsistencies all related date and time functions should use the same rounding method.

> If they are not compliant, we must bring to the development for fixing.

I fear the problem appeared because the development made the behaviour compliant to a broken specification.


> However I doubt the samples above can be considered a bug.
The line between bug or no bug seems to blur: It is confirmed behaviour that simple date and time calculations will show pseudo random results that are correct in 50% of the cases and one off in the other 50%(see bug 127334). I would consider this as a bug but this behaviour arguably follows from the specification. So this is not a bug?
Comment 4 Commit Notification 2019-12-29 10:23:12 UTC
Olivier Hallot committed a patch related to this issue.
It has been pushed to "master":

https://git.libreoffice.org/help/commit/efc2eba6081f95dbf83ea1776a563bbddaa13f68

tdf#127477 Incomplete description of date & time
Comment 5 Olivier Hallot 2019-12-29 14:59:15 UTC
I'm closing this bug as it has fixed the documentation part of the issue.

The SECOND() rounding number issue raised here is to be fixed upstream of documentation and is addressed in bug#127746. Once fixed, a companion documentation bug can be opened.
Comment 6 Olivier Hallot 2019-12-29 15:00:56 UTC
(In reply to Olivier Hallot from comment #5)
> I'm closing this bug as it has fixed the documentation part of the issue.
> 
> The SECOND() rounding number issue raised here is to be fixed upstream of
> documentation and is addressed in bug#127746. Once fixed, a companion
> documentation bug can be opened.

read bug127476
Comment 7 Albrecht Müller 2020-04-22 14:10:54 UTC
A request for a sample file for Bug 127476 brought this bug to my attention again. I think the documentation has improved considerably but it still lacks some important information:

1) The relation between formatting and and a couple of date and time functions should be clarified. 

2) Date and time function show some weird features which I think will surprise the unsuspecting user. Therefore this behaviour should be documented. The date and time functions are specified in a way that
2.1) they have to return the inconsistent values
and
2.2) they are expected to return pseudo random results in pretty common use cases.

Details to these points:
ad 1) The relation between formatting and date and time functions is not so trivial as it seems: The the semantics of the formatting of date and time values has changed recently. To my knowledge this change is not documented yet but there is a discussion about this subject in bug 127170. Due to this change the same value can be formatted in several ways. The resulting representations may contain different values for corresponding time units. Therefore the help information should clarify which of these possible values the corresponding date and time function should return. It also should specify some workarounds for the case that the other values are needed and how to deal with legacy spreadsheets that may have used date and time arithmetic with different semantics.

Ad 2) Maybe someone can analyse the specification (some links see comment #3) and confirm or refute my conclusions. If they are correct the help information should inform the users about a behaviour that they may not expect.

Ad 2.1) This problem is the subject of bug 127476. The help information should also specify what to do if some user wants the carry to be handled in the Excel way.

Ad 2.2) The internal representation of time values cannot represent date and time values exactly in general. It is not possible to avoid round of errors in floating point calculations. Most of the date and time functions are specified to use a round down method. All together this has the effect that doing elementary math with an integer number of time units will result in some pseudo random behaviour. After adding an integer number of minutes to another integer number of minutes the MINUTE function will probably show the correct result in 50% of the cases and a wrong number in the other 50%. I did not construct examples as I think this is a direct consequence of the specification. The formatting of date and time values seems to show a corresponding effect. As this is a more common use case the behaviour is already the subject of several bug reports for Calc (bug 127334 and bug 132083) and Writer (Bug 131151)

Therefore I reopen this bug report. I make it dependent on bug 127170 as I think the solution of this bug determines if the points 1) and 2) can be fixed by a few remarks or it is necessary to provide lengthy explanations of the idiosyncrasies of LibreOffice’s date and time functions.

Off topic: Is it correct that this bug depends on bug 80430? I suspect that this dependency should be the other way round but did not change it.
Comment 8 Albrecht Müller 2020-05-25 21:06:30 UTC
I just discovered that LibreOffice seems to contain two different implementations of date and time arithmetic: One for Calc and maybe Writer, another for Basic. These implementations sometime return different values for the same calculations, see bug 133389.

There are also independent descriptions of corresponding functions. Some examples:

MINUTE function in Calc: https://help.libreoffice.org/6.4/en-US/text/scalc/01/func_minute.html
Minute function in Basic: https://help.libreoffice.org/6.4/en-US/text/sbasic/shared/03030202.html?DbPAR=BASIC

TIME function in Calc: https://help.libreoffice.org/6.4/en-US/text/scalc/01/func_time.html
TimeSerial function in Basic: https://help.libreoffice.org/6.4/en-US/text/sbasic/shared/03030205.html?DbPAR=BASIC

The description of the TimeSerial function explains how this function handles parameters that are outside their usual range. There is no such information in the description of the corresponding TIME function.

Corresponding functions should behave in essentially the same way same. Therefore their descriptions should contain "See also" links referring to each other: Information that is missing in one the description of some function may be available in the description of the corresponding function.