99996 – Inaccurate representation of fraction for more than 3 digits with denominator

Bug 99996 - Inaccurate representation of fraction for more than 3 digits with denominator

Summary: Inaccurate representation of fraction for more than 3 digits with denominator

Status:	RESOLVED FIXED

Alias:	None

Product:	LibreOffice
Classification:	Unclassified
Component:	Calc (show other bugs)
Version: (earliest affected)	Inherited From OOo
Hardware:	All All

Importance:	medium normal
Assignee:	Laurent Balland

URL:
Whiteboard:	target:5.3.0
Keywords:

Depends on:
Blocks:

Reported:	2016-05-22 21:39 UTC by Wolfgang Jäger
Modified:	2017-01-30 23:51 UTC (History)
CC List:	4 users (show)

See Also:
Crash report or crash signature:

Attachments
Demo of the bug and mathematical considerations (36.64 KB, application/vnd.oasis.opendocument.spreadsheet) 2016-05-22 21:39 UTC, Wolfgang Jäger	Details
Reworked demo avoiding the impact of another bug (35.26 KB, application/vnd.oasis.opendocument.spreadsheet) 2016-06-16 09:28 UTC, Wolfgang Jäger	Details
The proposal and demonstration advertised in Comment # 8 (30.07 KB, application/vnd.oasis.opendocument.spreadsheet) 2016-06-20 11:26 UTC, Wolfgang Jäger	Details
The proposal rectified, better demonstrated and explained. (40.13 KB, application/vnd.oasis.opendocument.spreadsheet) 2016-06-22 10:18 UTC, Wolfgang Jäger	Details
Comparison of both algorithms (accuracy and speed) (23.08 KB, application/vnd.oasis.opendocument.spreadsheet) 2016-06-23 20:50 UTC, Laurent Balland	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Wolfgang Jäger 2016-05-22 21:39:06 UTC

Created attachment 125236 [details]
Demo of the bug and mathematical considerations

Under specific circumstances format codes like "# ?/????" result in wrongly displayed values even where the exact and correct result should be inside the range of applicability.

I experienced this by accident with LibO V 5.0.2.2 (x64) release under Win 10, UI=English(UK), locale=German(Germany). Changing the locale to English(UK) did not change the behaviour. 

Steps to reproduce: 
1. Create a new Calc document.
2. Set the 'Numbers' format code for column A to "# ?/????" or "#?/????"
3. Enter 2/2016 into A1, 1/2016 into A2 and 1/1008 into A3.
4. Verify the values being displayed as 10/9999, 5/9999, and 10/9999 respectively.
5. Enter 22/2016 into A4 and verify it being displayeed as 11/1009.

The display is wrong in every case. The correct formatting should show:
In case of 3. 1/1008, 1/2016, and 1/1008 respectively.
In case of 5. 11/1008.

These examples, a series of additional examples, and some exemplified mathematical considerations are contained in the attached file.

Comment 1 Aron Budea 2016-06-16 02:40:19 UTC

This seems to be something for Calc developers to look at.

Comment 2 Wolfgang Jäger 2016-06-16 09:28:12 UTC

Created attachment 125691 [details]
Reworked demo avoiding the impact of another bug

Comment 3 Wolfgang Jäger 2016-06-16 09:29:54 UTC

Concerning the above attached example:
Please note the impact of bug#97837 here. The example document "...001" was created using the mentioned fractional format "# ?/????". After a save/reload cycle this format was forgotten and "# ?/?" used instead. To get the demo as intended you first have to reinstate the number format "# ?/????" for column A at least. 
I prepared a reworked demo "...002" using the TEXT function to apply the format. This is not afflicted by the other bug.

Comment 4 Laurent Balland 2016-06-16 16:48:35 UTC

(In reply to Wolfgang Jäger from comment #3)
> Concerning the above attached example:
> Please note the impact of bug#97837 here. 
The bug you are speaking about is bug 88657 which should be fixed in LibO 5.2.0b2

I can confirm this bug in OOo 3.3.0
Set as New. Change version.

Comment 5 Wolfgang Jäger 2016-06-17 16:34:03 UTC

(In reply to Laurent BP from comment #4)
...
> I can confirm this bug in OOo 3.3.0
> Set as New. Change version.

This bug#99996 is still present in AOO4.1.2, the forgetting of user formats, too. 

To anybody who may address the issue of wrong results when formats of the fraction type are used: 
I can provide a well tested algorithm for the task (coded in BASIC e.g.). 

(This is not about parsing and interpreting the format code, but about calculating the best approximation that can be expressed without exceeding the accepted range for the denominator. Of course, this is not the secret of the ages, but well known mathematics. And counting some question marks is surely not a problem.)

Comment 6 Laurent Balland 2016-06-17 20:34:03 UTC

(In reply to Wolfgang Jäger from comment #5)
> (In reply to Laurent BP from comment #4)
> ...
> > I can confirm this bug in OOo 3.3.0
> > Set as New. Change version.
> 
> This bug#99996 is still present in AOO4.1.2, the forgetting of user formats,
> too. 
> 
> To anybody who may address the issue of wrong results when formats of the
> fraction type are used: 
> I can provide a well tested algorithm for the task (coded in BASIC e.g.). 

Fractions are determined in SvNumberformat::ImpGetFractionOutput here:
http://opengrok.libreoffice.org/xref/core/svl/source/numbers/zformat.cxx#ImpGetFractionOutput

You may propose improvements of Farey sequence algorithm. But the code is limiting to 3 digits the maximum number of digits in fraction. See:
http://opengrok.libreoffice.org/xref/core/svl/source/numbers/zformat.cxx#MAX_FRACTION_PREC

For higher number of digits, a simpler (and less accurate as you noticed) algorithm is used. See:
http://opengrok.libreoffice.org/xref/core/svl/source/numbers/zformat.cxx#2612

What should be the maximum number of digits of denominator for Farey sequence algorithm? You must take into account computing time and readability: 254136/527748 is not quite understandable.

Comment 7 Wolfgang Jäger 2016-06-17 23:31:11 UTC

(In reply to Laurent BP from comment #6)
Many thanks for giving me this insight, but there may be a misunderstanding. 
I would neither start to work on code in C++ (I always judged very ugly) at the age of 72 nor would I shift to the kind of designing and coding algorithms based on a next to complete encyclopedical knowledge of related mathematics. My style always was to rely on means as simple as possible, and to try to apply them in a slightly creative way. 
I did not yet analyze the code you linked to, and probably I wouldn't at all. But I know that for many centuries now best approximations by fractions were found based on developing the target value into a regular continued fraction. As there is an indisputable worst case, we can give a fix maximum of steps needed to convert any number to any given accuracy. For IEEE 754 accuracy e.g. this number is 36. The experimentally determined average for denominators up to 9999 is less than 10. The number of needed steps will increase very slowly with greater denominators permitted. The expense for any single step on this way is mainly a Double division. The loop may be exited based on a maximum denominator or on a condition directly aiming at relative accuracy (higher time-cost in this case: mainly 3 multiplications per step). 
The code for a reliable implementation of this algorithm in LibO BASIC, e.g. is less than 90 lines in a very sparse notation, and is already implementing both ways of controlling the exit from the central loop. Restricting the condition to a maximum denominator only will cut length and complexity. 

If there is evidence that this algorithm is too inefficient or too complicated, I cannot contribute anything more concerning this issue. Otherwise I will be glad to supply my proposal in a Pseudocode, in fully operative BASIC or in Pascal at your choice. I cannot do it using C++ (or any flavour of C).
> 
> Fractions are determined in SvNumberformat::ImpGetFractionOutput here: ...

Comment 8 Wolfgang Jäger 2016-06-20 11:24:46 UTC

I think I got the idea of the algorithm coded in C++, and linked in here by Laurent, but for more than one reasons I cannot be sure with respect to every detail. 

I now decided to post a proposal to the effect to replace the usage of Farey sequences (with reduced accuracy above MaxDenominator=999) by the application of continued fractions. As far as I could test it, it should be no less efficient. I only could code it in the (VERY poor) LibreOffice BASIC, and test it this way (without setting any format properties / changing the .String property-value, of course) for afflicted cells. This was done implementing a function also usable for mathematical purposes, and called by a helper function working similar as the standard function TEXT in this specific case. 

I wouldn't inflate this comment by repeating anything already contained in the demonstrating example. See attachment "Proposal.ods" if interested at all! The code is commented to some detail.

Unfortunately I have no access to literature on continued fractions in English or French. My sole source is: 
"Die Lehre von den Kettenbrüchen" by Oskar Perron, 3. Edition (final), 1954, 
still available in German in different reprints (mine of 1977).

Comment 9 Wolfgang Jäger 2016-06-20 11:26:54 UTC

Created attachment 125771 [details]
The proposal and demonstration advertised in Comment # 8

Comment 10 Wolfgang Jäger 2016-06-20 12:10:24 UTC

Sorry! The above provided Proposal.ods still contained a line of code and a formula in the wrong versions. 

1.
Please replace the first actual statement in the body of the function 'formatToFraction' by

lExtRes = bestApproxFraction(pValue,0+String(pDenomLength,"9"))

2. Please change the formula in the cell DemoAndTesting.E2 to:

=IF(C$2="R";FORMATTOFRACTION(A2;0;1;3);"")
and fill the corrected formula down till cell E61.

Comment 11 Wolfgang Jäger 2016-06-22 10:18:05 UTC

Created attachment 125824 [details]
The proposal rectified, better demonstrated and explained.

Comment 12 Laurent Balland 2016-06-22 20:14:12 UTC

Thanks for your proposition. I will have a look at it.

Comment 13 Laurent Balland 2016-06-23 20:50:07 UTC

Created attachment 125870 [details]
Comparison of both algorithms (accuracy and speed)

I converted your BASIC code in C++ code in LibO and I checked that fractions were the same with this new algorithm (see Accuracy tab).
I made some tests to compare speed of both algorithms (see Speed tab). I created a huge file with 1000 rows, 100 columns, 10 sheets containing 1 million decimal values (all different). I changed Default style to use fraction number format, and measured time (average of 5 tests) to recalculate 1 million fractions with different number of digits for denominator

* Actual algorithm based on Farey's sequence:
I changed MAX_FRACTION_PREC value to allow up to 5 digits in denominator with this exact algorithm.
 - 1 digit: 20.23"
 - 2 digits: 21.20"
 - 3 digits: 26.36"
 - 4 digits: 1'10.67"
 - 5 digits: 8'35"
This time includes many things in addition to calculation of fraction (update, display...). This algorithm is limited to 3 digits. For 4 and more digits, LibO is using a faster and much less accurate algorithm.

* New algorithm based on continued fraction representation proposed by  Wolfgang Jäger (many thanks to him)
 - 1 digit: 19.91" (-1.5%)
 - 2 digits: 20.15" (-5%)
 - 3 digits: 20.39" (-22%)
 - 4 digits: 20.49" (-70%)
 - 5 digits: 20.74" (-96%!)
 - 6 digits: 20.98"
Increasing number of digits does not inflate calculation time. This new algorithm is much faster. Commit is on the way...

Comment 14 Wolfgang Jäger 2016-06-24 20:34:34 UTC

(In reply to Laurent BP from comment #13)
> ...
> 
> * Actual algorithm based on Farey's sequence:
> I changed MAX_FRACTION_PREC value to allow up to 5 digits in denominator
> with this exact algorithm.
> ...
> Increasing number of digits does not inflate calculation time. This new
> algorithm is much faster. Commit is on the way...

Thank you for the interest and the engagement. You did a big chunk of coding and testing. Concerning the efficiency I am not too surprised. 
However ...

As you advertised a commit I feel responsible for the correctness and the efficiency of the proposed algorithm. There was at least one flaw in my code concerning the tests for "better collateral AF". 
Would you mind to post your code? I would try to understand another poem in C++.

By the way: "this exact algorithm" (based on Farey sequences) has its flaws, too. Just one example: The randomly generated number 
-575.774749601315 formatted "0 ?/???" comes out as "-575 86/111" while it should display as -575 540/697. There are cases of errors with "0 ?/??", too.

I would like to propose additional test cases anyway.

Comment 15 Laurent Balland 2016-06-24 21:33:29 UTC

(In reply to Wolfgang Jäger from comment #14)
> As you advertised a commit I feel responsible for the correctness and the
> efficiency of the proposed algorithm. There was at least one flaw in my code
> concerning the tests for "better collateral AF". 
> Would you mind to post your code? I would try to understand another poem in
> C++.
Commits are publicly available. I just wanted to wait that Jenkins (automatic tests) was happy with it before making publicity:
https://gerrit.libreoffice.org/26621/

Comment 16 Laurent Balland 2016-07-03 18:40:02 UTC

Finally, my commit passed Jenkin test. It is now under review.
Main code starts on line 2440. rInfo.nCntExp is the number of digits expected for the denominator, fNumber is >0 decimal part of the number to represent as a fraction = nFrac/nDiv

else // Calculated Denominator
{
    sal_uInt64 nBasis = ((sal_uInt64)floor( pow(10.0,rInfo.nCntExp))) - 1; // 9, 99, 999 ,...
    sal_uInt64 nFracPrev = 1L, nDivPrev = 0, nFracNext, nDivNext, nPartialDenom;
    double fRemainder = fNumber, fTemp;

    // Use continued fraction representation of fNumber
    // See https://en.wikipedia.org/wiki/Continued_fraction#Best_rational_approximations
    while ( fRemainder > 0.0 )
    {
        fTemp = 1.0 / fRemainder;                    // 64bits precision required when fRemainder is very weak
        nPartialDenom = (sal_uInt64) floor(fTemp);   // due to floating point notation with double precision
        fRemainder = fTemp - (double)nPartialDenom;
        nDivNext = nPartialDenom * nDiv + nDivPrev;
        if ( nDivNext <= nBasis )  // continue loop
        {
            nFracNext = nPartialDenom * nFrac + nFracPrev;
            nFracPrev = nFrac;
            nFrac = nFracNext;
            nDivPrev = nDiv;
            nDiv = nDivNext;
        }
        else // calculate collateral fraction and exit
        {
            sal_uInt64 nCollat = (nBasis - nDivPrev) / nDiv;
            if ( 2 * nCollat >= nPartialDenom )
            {
                sal_uInt64 nFracTest = nCollat * nFrac + nFracPrev;
                sal_uInt64 nDivTest  = nCollat * nDiv  + nDivPrev;
                double fSign = ((double)nFrac > fNumber * (double)nDiv)?1.0:-1.0;
                if ( fSign * ( double(nFrac * nDivTest + nDiv * nFracTest) - 2.0 * double(nDiv * nDivTest) * fNumber ) > 0.0 )
                {
                    nFrac = nFracTest;
                    nDiv  = nDivTest;
                }
            }
            fRemainder = 0.0; // exit while loop
        }
    }
}

Comment 17 Laurent Balland 2016-07-03 18:46:17 UTC

Update summary

Comment 18 Commit Notification 2016-07-04 22:55:32 UTC

Laurent Balland-Poirier committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=051329101dc249535dd09eeb34caf1c21719064f

tdf#99996 New algorithm for fraction

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 19 Laurent Balland 2016-07-05 21:46:43 UTC

@Wolfang: please test with master and tell if this implementation of your algorithm fit your needs

Comment 20 Commit Notification 2016-07-06 09:55:30 UTC

Laurent Balland-Poirier committed a patch related to this issue.
It has been pushed to "master":

http://cgit.freedesktop.org/libreoffice/core/commit/?id=37d97ff389d0941b32405f3433d2e595e10c76ab

tdf#99996 tdf#100594 tdf#100754 Add qa unit test

It will be available in 5.3.0.

The patch should be included in the daily builds available at
http://dev-builds.libreoffice.org/daily/ in the next 24-48 hours. More
information about daily builds can be found at:
http://wiki.documentfoundation.org/Testing_Daily_Builds

Affected users are encouraged to test the fix and report feedback.

Comment 21 Wolfgang Jäger 2017-01-30 22:03:32 UTC

(In reply to Laurent BP from comment #19)
> @Wolfang: please test with master and tell if this implementation of your
> algorithm fit your needs. 

(It's not really about my pesronal needs. My participation was induced by requests in forums. And this happened to be a case where I knew a bit about the related mathematics.) 

My apologies. Only with a much too long delay I tested. My experiences were discouraging. I did neither get a fully reproducible behaviour with V5.3.0.2RC on my system, nor did I understand the programmatical background sufficiently. The code you pointed me to I studied. It seems to be correct with a faint doubt remaining due to my poor knowledge of C++. But there seem to be influences from other parts of code with respect to this topic and probably depending on UI-language, locale, ... 

Now about my tests with:
"Version: 5.3.0.2 (x64)
Build ID: 5ad7b2889021c491af62f7930a4b1cb631392f16
CPU Threads: 4; OS Version: Windows 6.19; UI Render: default; Layout Engine: new; 
Locale: de-DE (de_DE); Calc: group"

A)
Create a new spredsheet document. The 'Numbers' format of all the cells is "General" (or its equivalent under the chosen locale. 
In Sheet1.A1 enter the formula =1/3 and in Sheet1.A2 = 17/35
Change the 'Numbers' format of A1 to "# ?/?"  and that of A2 to "# ?/??"
Expected: "1/3" shown in A1 and "17/35" in A2
Experienced: " 2/7" in A1 and " 1/2 " in A2 
The LEN function returns 3 for both cells while insight into the cell's .String property proves 4 and 5 respectively as correct.
the TEXT function applied with the mentioned respective format codes returns the same wrong results (with lengths 4, 5).
It seems as if line 2734 does not always assign an exact 0 to fRemainder if it should and thus the loop may run another time where it shouldn't. 

B)
Save, close, and reload the document: Unchanged view. The format codes are changed by inserting a pair of quotation marks in each, surrounding the space.
Manipulate the formats and everything. Sometimes you may then get the correct display, mostly you will not. 

C) It's a mess. How to report about such observations? 

D) Maybe I have to retire from this kind of contributing to free software.

From the draft for an unposted comment: 
Sorry, extremely sorry! You tested to some extent and so did I. Both of us did not test enough with the very simple cases where exact fractions with small numerators and denominators are used to define the value to approximate. This is just another example for the fact that testing is not a sound replacement for thorough design and basic considerations about correctness. To avoid misunderstandings: It's completely MY responsibility!

I changed my attitude and completely rewrote my user function demonstrating the mathematical approach I suggested for the purpose - now avoiding spaghetti style and emphasising correctness from the beginning. However, it's still in BASIC. If interested, you can get the commented code.

Comment 22 Laurent Balland 2017-01-30 22:45:28 UTC

(In reply to Wolfgang Jäger from comment #21)
> In Sheet1.A1 enter the formula =1/3 and in Sheet1.A2 = 17/35
> Change the 'Numbers' format of A1 to "# ?/?"  and that of A2 to "# ?/??"
> Expected: "1/3" shown in A1 and "17/35" in A2
> Experienced: " 2/7" in A1 and " 1/2 " in A2 

Hi Wolfgang,

It's great to ear from you.
However, I can't reproduce with my master build. I get expected display. Could you attach a test file?

Comment 23 Wolfgang Jäger 2017-01-30 23:51:06 UTC

(In reply to Laurent BP from comment #22)
> 
> ...
> However, I can't reproduce with my master build. I get expected display.
> Could you attach a test file?

(I will not be able to return to this within the next day, and now I need some sleep.) 

Seems I wasted a lot of time with fighting phantoms. Desperately looking for possible reasons of my strange observations I found that - supposedly due to a few crashes I had (5.3 not exactly stable yet?) - Calc had changed to 'Precision as shown' without my consent. Probably this also wasn't even stable. (I never use this silly setting.)

However, if 1/3 is passed as 0.3 the fraction 2/7 is actually approximating the value better than 1/3. On the other hand I looked thoroughly at everything, and this cannot be the whole truth concerning my problems to get it clear ... I am still not sure about the actual state of the cells I got the values from during my testing. 

You may relax, however. Till...

Thanks for your interest!