132083 – FORMATTING - Wrong date substraction in cell's operation

Bug 132083 - FORMATTING - Wrong date substraction in cell's operation

Summary: FORMATTING - Wrong date substraction in cell's operation

Status:	CLOSED DUPLICATE of bug 127170

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	6.4.2.2 release
Hardware:	x86-64 (AMD64) Windows (All)

Importance:	medium normal
Assignee:	Not Assigned

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2020-04-13 11:00 UTC by Jaromir / jarko@sortsk.pl
Modified:	2020-05-25 19:49 UTC (History)
CC List:	4 users (show)

See Also:	127334 131151 127477 133389
Crash report or crash signature:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jaromir / jarko@sortsk.pl 2020-04-13 11:00:26 UTC

Description:
E.g. "2020-04-13 12:18:00" - "2020-04-13 12:08:00" results in 9 (nine) minutes, instead of 10. To get 10 we have to add one second:
"2020-04-13 12:18:01" - "2020-04-13 12:08:00" = 10

Other values are computed correctly, but not sure, if in ANY case.

Steps to Reproduce:
1. Set 'date' format for two cells
2. Fill in both: 2020-04-13 12:18:00 and 2020-04-13 12:08:00
3. Substract values in third cell, formatted as 'hour'

Actual Results:
9 (minutes)

Expected Results:
10 (minutes)


Reproducible: Always


User Profile Reset: No



Additional Info:
Wersja: 6.4.2.2 (x64)
Identyfikator kompilacji: 4e471d8c02c9c90f512f7f9ead8875b57fcb1ec3
Wątki CPU: 4; System operacyjny: Windows 6.1 Service Pack 1 Build 7601; Render UI: domyślny; VCL: win; 
Ustawienia regionalne: pl-PL (pl_PL); Język UI: pl-PL
Calc: threaded

Comment 1 Mike Kaganski 2020-04-13 11:39:10 UTC

Do not reproduce with Version: 6.4.3.1 (x64)
Build ID: 4d2b2b47cca498fed6abf712a36d0788901091eb
CPU threads: 12; OS: Windows 10.0 Build 18363; UI render: default; VCL: win; 
Locale: en-US (ru_RU); UI-Language: en-US
Calc: threaded

Cell A1 (date+time): 2020-04-13 12:18:00
Cell A2 (date+time): 2020-04-13 12:08:00
Cell A3 (formatted [HH]:MM:SS, i.e. as time span, not wall clock): =A1-A2

The result in A3 is 00:10:00.

If I format it as wall clock, using HH:MM:SS, then I see 00:09:59, because of floating-point error ( https://wiki.documentfoundation.org/Faq/Calc/Accuracy ).

Closing NOTABUG.

Comment 2 Aron Budea 2020-04-13 13:13:36 UTC

Interestingly it displayed 0:10 in both cases before the following commit (just noting here):

https://cgit.freedesktop.org/libreoffice/core/commit/?id=e2e47898180e547cad7ccde1e5890385d573e551
author		Eike Rathke <erack@redhat.com>	2018-08-31 11:21:03 +0200
committer	Eike Rathke <erack@redhat.com>	2018-08-31 12:50:52 +0200

Use tools::Time::GetClock() in number formatter for wall clock time

Comment 3 Mike Kaganski 2020-04-13 13:25:31 UTC

(In reply to Aron Budea from comment #2)
> Interestingly it displayed 0:10 in both cases before the following commit
> (just noting here):
> 
> Use tools::Time::GetClock() in number formatter for wall clock time

Yes, that was the intended change. The idea is this: when people talk about datetimes, and want to express that in hours and minutes, they say "2020-04-13 23:59" even when it's "2020-04-13 23:59:58". They will not say that it's tomorrow's day when it's still 2 seconds to it left. Outputting that as "00:00" is wrong. Thus, when formatting wall clock times, it doesn't get rounded up, but truncated. This makes tiny floating-point inaccuracies (when the nearest floating-point value is slightly less than exact value) to result in this. See also bug 127334 comment 8, where I suggest to limit the precision of time formatting, to avoid this.

Comment 4 Albrecht Müller 2020-04-13 17:50:06 UTC

I think this is a bug. Look at the problem from a users point of view: The essential part is that you subtract two numbers containing only integral multiples of time units, you get an integral result and this result is obviously wrong. As only integral numbers are involved there is no room for round off errors. If this is not a bug what else can count as bug?

And I think it is a pretty serious one: How can I trust the results of more  complicated calculations if I can easily demonstrate that LibreOffices does not even get simple integer calculations right?

Comment 5 Mike Kaganski 2020-04-13 19:54:22 UTC

(In reply to Albrecht Müller from comment #4)
> I think this is a bug. Look at the problem from a users point of view: The
> essential part is that you subtract two numbers containing only integral
> multiples of time units, you get an integral result and this result is
> obviously wrong. As only integral numbers are involved there is no room for
> round off errors. If this is not a bug what else can count as bug?
> 
> And I think it is a pretty serious one: How can I trust the results of more 
> complicated calculations if I can easily demonstrate that LibreOffices does
> not even get simple integer calculations right?

It's a misconception. You are making arithmetics with times. And time in LibreOffice is just a fraction of a base unit, which is a day. 12:00:00 in LibreOffice is 0.5 (half a day). You are *not* doing simple integer calculations. Just try to format your cells as numbers.

And you may see the bug I linked as See Also that you might want to track. This is either NOTABUG or duplicate of that one, in any case it should be closed.

Comment 6 Jaromir / jarko@sortsk.pl 2020-04-13 20:15:06 UTC

Thank you for quick answer and detailed explanation. 
In my opinion the problem has been arisen due to conception to prefer fast operations over precise results, in any case. Thus, the user is finally  responsible for correct calculation. Even for simple addition 2+2...

Comment 7 Mike Kaganski 2020-04-13 21:07:19 UTC

(In reply to jarko from comment #6)
> In my opinion the problem has been arisen due to conception to prefer fast
> operations over precise results, in any case. Thus, the user is finally 
> responsible for correct calculation. Even for simple addition 2+2...

It is wrong. I already tried to explain the reason for the change. And it has nothing to do with "fast over precise". The older variant was *less* precise.

But you seem to not realize that the problem here is *wrong format* used. "Time" is a complex concept. And there are at least two distinct and very different things. One is *wall clock time*: it is the point in time. Another is *time span* - that is some length of time, that you get by subtracting two wall clock times - it is time difference. When you use the latter, you may want to see something like "128 h 32 m". Working with that, you likely want to round towards the nearest: if you have 128 h 32 m 42 s, and show up to minutes, you likely want to see 128 h 33 m. But when you are working with clock wall time, you see something like "2019-12-31 23:59:59". And you don't want your events that happened at that moment to be registered in 2020. So *correct* and *precise* result there is *never* round up, but truncate the display down.

In LibreOffice, you use "HH:MM" for wall clock times, and "[HH]:MM" for time spans. And the change made them behave according to this model: when you use wall clock time format, you get truncation (so 2019-12-31 23:59:59.9999999999 that is shown up to minute will be still 2019-12-31 23:59); when you use time span formats, you get rounding to nearest.

What was proposed in the comment that I mentioned above was that even in wall clock format we still need to limit the precision, and round to nearest millisecond. That is just my proposal, and I don't know what problems that might have, but the idea is to limit this specific kind of problem.

But using "HH:MM" for a cell with "A1-A2" is *conceptually* wrong, and "[HH]:MM" should be used in this case.

Please stop reopening this, unless you have a strong reason to believe that this is different from what I described.

Comment 8 Jaromir / jarko@sortsk.pl 2020-04-14 10:00:19 UTC

Appreciate your deep and comprehensive explanation. Thank you!
At the end let me say little words of my approach. In huge prevailing cases internal arithmetic counting time difference gives us the same result for both formats, no matter which is used, "wrong", or "right". Rarely we meet  value variations. 
But in example, we have 2 well defined points, strict up to second. Time lapse is 10 minutes. No more, no less. No need to truncate or round fraction of time. Exactly, physically 10 minutes. So one may expect the "conversion" between time span to time spot will run - I would say - seamlessly. In the matter of obtaining same results.
Once again, thanks a lot for your time.

Comment 9 Mike Kaganski 2020-04-14 11:40:27 UTC

(In reply to jarko from comment #8)
> But in example, we have 2 well defined points, strict up to second. Time
> lapse is 10 minutes. No more, no less. No need to truncate or round fraction
> of time. Exactly, physically 10 minutes. So one may expect the "conversion"
> between time span to time spot will run - I would say - seamlessly. In the
> matter of obtaining same results.

We have two points in time:

2020-04-13 12:18:00 in LibreOffice is "43934 days since day zero + 12/24 (12 h) + 18/1440 (18 min)" = 43934.5125 (exactly; this is also exactly representable in binary double).

2020-04-13 12:08:00 in LibreOffice is "43934 days since day zero + 12/24 (12 h) + 8/1440 (8 min)" = 43934.505555555555555555... - periodic decimal, not representable exactly in binary (closest binary representation is 43934.5055555555591126903891563, which is slightly *larger* than exact value you intended).

Now LibreOffice subtracts these *binary* values (which are approximations of original values; the first approximation happened by accident to be exact, while the second is more usual case, with a small error).

The result of subtraction a slightly larger value from exact value is slightly lesser value, right? That is the case here. The result is not exact 00:10:00 (which is 0.00694444444444444... periodic decimal, not representable exactly in binary double), but 00:09:59.999999... (0.00694444443287037... in decimal). Note that this was *always* the case, even before the commit mentioned in comment 2! What has changed after that commit was only the display of that value when using *wall clock* formatting. 09 min 59.9999... s is no more rounded up toward 10 min, but truncated to 09 min *on display*.

I had already mentioned that this is the result of inherent floating-point arithmetics operations, which is covered in FAQ (see comment 1). You just need to realize that times are *always* fractional numbers (represented in binary floating points as per IEEE 754), no matter if you think that you enter some "whole numbers" of hours or minutes; and any operations with times are subject to universal floating-point handling rules and gotchas.

Comment 10 Jaromir / jarko@sortsk.pl 2020-04-14 13:29:55 UTC

The whole discussion is about what kind of method is preferred when presenting data: impossible exact binary representation of given number or exact number known to user. If program/procedure/script assumes that "user has always right", than it proceed with any operation needed to finally get that exact number. If not - simply do the best with arithmetic, but with rounding error.

Comment 11 Mike Kaganski 2020-04-14 13:58:06 UTC

(In reply to jarko from comment #10)
> The whole discussion is about what kind of method is preferred when
> presenting data: impossible exact binary representation of given number or
> exact number known to user. If program/procedure/script assumes that "user
> has always right", than it proceed with any operation needed to finally get
> that exact number. If not - simply do the best with arithmetic, but with
> rounding error.

Sigh. You seem to not follow the "discussion".

First, you confuse "presenting data" and storing the data. Then you mix the specific issue raised here in this tdf#132083 (which is about changed *display* - presentation - of the very same data that was already in previous versions) with an irrelevant "discussion" about "any operation needed to finally get that exact number" (this is not about representation, but about calculations), which is explained in the FAQ. In the end, you don't realize that there's *no* way to do what you think is doable ("any operation needed to finally get that exact number"). Very often, simple (for a basic school pupil) mathematics would need infinite time and memory to be calculated exactly in existing computers. So this is not some meaningful "discussion" about what "kind of method" to use, but rather about "what this world is" vs "what I want it to be", which is absolutely useless in this issue - it's simply offtopic. You *can not* make calculations with data with infinite precision. You can't even have a true Pi in a modern computer, no matter how long will you calculate that to "assume that user has always right".

*Sometimes* some *small subset* - not all - of operations could be calculated exactly if done differently (it means, not using the computer's arithmetic processor, but with in software algorithms requiring much extra memory, and much extra time). But even that is *not* "assuming that user has always right", because the primary goal of spreadsheet software is *massive calculations* on big amounts of data represented in tabular form. The user for whom the software is created needs it to process those big amounts in timely fashion, and it's *that* user who the software assumes is right in that expectation. If Calc would try to give exact results even in theoretically possible cases, its users (who use it for real job, not as a calculator for two numbers) would say "It must give us answers, not f**k up our precious time for own enjoyment calculating with full precision, which taking years to complete".

Comment 12 Jaromir / jarko@sortsk.pl 2020-04-14 14:35:24 UTC

All these are consequences of the use of floating point numbers, while best fits integers.

Comment 13 Mike Kaganski 2020-04-14 15:02:58 UTC

(In reply to jarko from comment #12)

which is again offtopic here, since representation of times as fractions of day is industry standard, that is directly used in billions of spreadsheets (e.g. in formulas that take fractional parts to strip days), etc. This is not something to be changed into some integer number of some basic time unit.

Or did you mean that the whole user input should be kept as "integer for years, integer for months, integer for days, integer for hours, integers for minutes, integers for seconds, integer for ..."? Then you seem to have a secret recipe how to operate with such a monster (e.g., subtract two such datetimes) efficiently.

Comment 14 Jaromir / jarko@sortsk.pl 2020-04-14 18:45:13 UTC

Maybe simply store amount of seconds in '__int64'. +/- 10 thousand years require only ~330 billions of seconds. There is plenty of room even for fractions (milliseconds, microseconds or nanoseconds). Never mind. 
As for speed of operation we double computing power every couple years. This is something what evolves continuously. But if you once choose not precise arithmetic - there won't be any improvements in the future.

So - enough OT from my side. Any way, you have prepared GREAT product. Thank YOU, guys.

Comment 15 Albrecht Müller 2020-04-14 21:35:37 UTC

@Mike Kaganski:
If you look at the description of the bug the problem boils down that you subtract 8 minutes from 18 minutes. A fourth grader should be able to tell you that this gives 10 minutes. But Calcs actual result is 9. This is clearly wrong. That is why I think this is a bug.

Especially as it is that simple I would classified this as a major bug. Therefore I went to great lengths to analyse what went wrong, detected a can of worms and tried to describe some of them in form of bug reports. The following is just a summary of my findings:

LibreOffices provides a very simple mechanism for date and time calculations. You can consider it as just a fancy representation of floating point numbers which is good enough for many common date or time calculations. It fails as soon as you have to deal with concepts such as daylight saving times.

Programs that are around for decades show that it is possible to use the internal representation of date and time values in a way that the average user will never notice the fact that the representation in general cannot represent time values exactly. LibreOffice was one of those programs. Bug 127334 comment 8 seems to indicate that you understand how this can be achieved. The basic idea: Use some basic unit of time such as a second or something that is an integral fraction of a second, e.g. a millisecond. Round any values to the nearest integral multiples of this basic unit. The rest of the calculations uses integer multiples of the basic unit only. As these calculation are based on integer values they are exact and consistent – no additional round off errors will be introduced. With the right choice of this basic unit the round off errors of the floating point calculations will almost never accumulate to a degree that rounding will select the wrong time unit. Therefore a user will almost never see effects caused by round off errors. I think this is how this established industry standard works but I never saw a formal specification of it. No fancy stuff like infinite precision or the like is required, just careful rounding and integer arithmetic.

I think I could trace the source of the problems to a BrokenDateTimeSpec. That’s why I gave bug 127334 this alias which recently has been removed. Why?

The problem is that this spec does not specify the essential features of the LibreOffice time calculation mechanism. I did not find any specification that states what a concept of "wall clock time" or "duration" means. Given the crude approximation of date and time calculations does it make sense to use the full power of double floating point precision? If you really want to do this you have to be specific about the meaning of the number 1: At this level of precision it matters if you define it as one revolution of the earth or as 24 * 60 * 60 seconds of an atomic clock. What about daylight saving times, time zones, calendars defined by astronomical observations or depending on future events you don’t know yet, leap seconds etc.? That is where things start to become really complicated.

Instead the specification defines some algorithms for the MINUTE, SECOND etc. functions that are contradictory (e.g. the SECOND function is supposed to return integer values between 0 and 59, but also to round to the nearest second. What shall this function return for 59.6 seconds?, see http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1018202_715980110 ), use inconsistent rounding (the SECONDS function rounds to the next second, the MINUTE function does not, see http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part2.html#__RefHeading__1018194_715980110 ) and thus deviate from this established industry standard. I fear the authors of this specification were not aware of the implications of their work. Unfortunately I don’t know how to write a bug report against the specification and I was not able to find someone to fix these problems.

The help information leaves out a lot of essential information. So I simply don’t know what to expect if I use features like the [HH]:MM or HH:MM format constructs (see https://help.libreoffice.org/6.4/en-US/text/shared/01/05020301.html?&DbPAR=CALC&System=WIN#hd_id3155870, where is a reference to the special treatment of wall clock times or durations?) or the various date and time functions.

These deficiencies leave a lot of room for different interpretations. In the case at hand LibreOffice contains a pseudo random generator that essentially uses the round off errors of floating point calculations to decide if it will return a correct or a wrong value. This typically gives 50% correct values where the user expects 100%. I think I saw a bug report which reported essentially the same problem for text tables in Writer. There are other problems related to these deficiencies, e.g. inconsistent results of date and time functions.

I reopened this bug. Before you revert this to NOT A BUG again, please answer the following questions:
- I think there should be pretty good reasons to justify a change that breaks the billions of spreadsheets you mention in comment #13. The change causes these spreadsheets to show pseudo random numbers where they used to show correct results before. What are these reasons? I think an argument as in comment #3 is not sufficient. The intended semantics of some time information may be something like "about three o'clock". In this case you need duration time rounding for wall clock times. And we are talking about time frames that are well below the milliseconds range.
- Do you really want do demonstrate that the quality standards of LibreOffice are so low that the expected behaviour (definition of NOT A BUG!) of this program is to produce wrong results where a fourth grader is expected to come up with the right solutions?

Comment 16 Mike Kaganski 2020-04-15 05:38:00 UTC

Eike: I give up. While I suppose I made my point in Bug 127334 comment 8, some people seem to believe that if they say the same in multiple times, they will not create irritation in those who might actually fix it.

Comment 17 Eike Rathke 2020-04-15 11:49:48 UTC

Once and for all:

HH:MM is a *CLOCK* time format, your clock does not display 13:00 if the time is 12:59 and 31 seconds, or 00:00 for a time of 23:59 and 31 seconds. Certainly you do not expect to see 2020-04-16 00:00 or even worse 2020-04-15 00:00 instead of 2020-04-15 23:59

Furthermore, HH:MM is restricted to values between 00:00 and 23:59; if you subtracted date+time values that span 24 hours or more you'd get wrong display values with that.

[HH]:MM is a *DURATION* time format that handles all that, does proper rounding and can display negative times as well.

However, I agree that the documentation / help could be more verbose on that and explain better. For this it's a duplicate of bug 127170.

*** This bug has been marked as a duplicate of bug 127170 ***