Issue 22811 - OO.o Calc - function precision bugs
Summary: OO.o Calc - function precision bugs
Status: CLOSED FIXED
Alias: None
Product: Calc
Classification: Application
Component: code (show other issues)
Version: OOo 1.1
Hardware: PC Linux, all
: P3 Trivial (vote)
Target Milestone: ---
Assignee: oc
QA Contact: issues@sc
URL: http://bugzilla.gnome.org/show_bug.cg...
Keywords: needmoreinfo
: 40833 (view as issue list)
Depends on:
Blocks:
 
Reported: 2003-11-24 17:48 UTC by mukund
Modified: 2013-08-07 15:15 UTC (History)
6 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
Test case Excel spreadsheet for future regression testing (OpenOffice.org and Gnumeric compatible) (7.50 KB, application/octet-stream)
2003-11-24 19:37 UTC, mukund
no flags Details
A one-file simple program which shows the different implementations of OpenOffice.org, Gnumeric, and fixed OpenOffice.org code (2.89 KB, text/plain)
2003-11-24 19:38 UTC, mukund
no flags Details
Patch for OpenOffice.org 1.1.0 attached (645 bytes, patch)
2003-11-24 19:39 UTC, mukund
no flags Details | Diff
GCC bug? (3.01 KB, text/plain)
2003-11-24 21:57 UTC, mukund
no flags Details
New OpenOffice.org 1.1.0 patch which produces same results as Gnumeric (2.93 KB, patch)
2003-11-24 22:00 UTC, mukund
no flags Details | Diff
Patch for OpenOffice.org 1.1.0 for SKEW() and KURT() (8.04 KB, patch)
2003-11-25 22:23 UTC, mukund
no flags Details | Diff
New patch for SKEW(), KURT() and GEOMEAN() (10.89 KB, patch)
2003-11-26 11:36 UTC, mukund
no flags Details | Diff
Test spreadsheet found on the Internet with SKEW(), KURT() and GEOMEAN() functions added (44.00 KB, application/octet-stream)
2003-11-27 22:35 UTC, mukund
no flags Details
Screenshot of test spreadsheet in Gnumeric 1.2.1 (65.90 KB, image/png)
2003-11-27 22:37 UTC, mukund
no flags Details
Screenshot of test spreadsheet in friend's MS Excel (96.44 KB, image/png)
2003-11-27 22:39 UTC, mukund
no flags Details
Screenshot of test spreadsheet in unpatched OpenOffice.org 1.1.0 (51.83 KB, image/png)
2003-11-27 22:40 UTC, mukund
no flags Details
Screenshot of test spreadsheet in patched OpenOffice.org 1.1.0 (52.95 KB, image/png)
2003-11-27 22:40 UTC, mukund
no flags Details
4 - 3 = ... 0?? (6.17 KB, application/vnd.sun.xml.calc)
2004-04-22 03:15 UTC, erikanderson3
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description mukund 2003-11-24 17:48:10 UTC
This bug was initially started at:
http://bugzilla.gnome.org/show_bug.cgi?id=127771

There is information of interest there.

oocalc's numeric precision is pretty awful for a load of common functions.
This should be substantially improved, in-line with gnumeric. Functions
VAR, LINEST, SKEW, PRODUCT, GEOMEAN, POWER at a minumum should be improved.

This bug is part of the Integrated Collaborative Desktop Bounty Hunt.  For
more information on prizes, contest rules, and other bounty tasks, visit:
 
http://www.gnome.org/bounties/
Comment 1 mukund 2003-11-24 19:36:24 UTC
For reference, I'm posting this info from the other bugzilla page:

I am including a comment block from a program file I am about to attach:

/* The following main() function has three different sections, where
the N - 1 variance is
   calculated three times using different methods. First duplicates the
   OpenOffice.org default behaviour. The second duplicates the
Gnumeric behavior.
   
   The third is a fixed OpenOffice.org behaviour. The OpenOffice.org
loss of precision
   happens, probably intentionally due to the approxAdd(),
approxEqual(), approxSub() and similar
   functions which drop precision.
   
   The fixed OpenOffice.org behaviour generates results which are
slightly different
   from Gnumeric results. This is due to the formula used. There is
loss of precision
   in both the Gnumeric way and the OpenOffice way, due to the
inherent nature of
   floating point architecture. They both produce slightly different
results.

   Gnumeric uses the following formula to find the N - 1 variance:
   
   s^2 = (1 / (N - 1)) * Sum1..N((Xi - M)^2)
   
   where N is the count of the set, Xi is an element of the set, M is
the mean of the set.
   
   Due to the stack based parsing nature of the OpenOffice.org code,
it uses a different
   formula derived from the above formula, as it does not know the
mean M in advance:
   
   s^2 = (1 / (N - 1)) * [Sum1..N(Xi^2) - {(Sum1..N(Xi))^2 / N}]
	

*/

---------------------------------

The loss of precision is due to these approx*() functions throwing it
away. Not knowing OpenOffice.org code so well, I don't know if it
would be wise to edit these approx*() functions and "correct" them as
they may cause side effects in other places. The approx*() functions
were obviously written with a purpose.

We could probably patch GetStVarParams() which would fix many
statistics functions which depend on it such as VAR(), VARP(), etc.

---------------------------------

It appears that for some of these functions, the bad results are due
to loss of precision in the approx*() functions in
sal/inc/rtl/math.hxx. The presence of these approx*() functions are
intentional. But they probably should not be used in the statistics
and math functions.

Maybe someone on the OO team can shed some more light on this one.

Comment 2 mukund 2003-11-24 19:37:29 UTC
Created attachment 11531 [details]
Test case Excel spreadsheet for future regression testing (OpenOffice.org and Gnumeric compatible)
Comment 3 mukund 2003-11-24 19:38:53 UTC
Created attachment 11532 [details]
A one-file simple program which shows the different implementations of OpenOffice.org, Gnumeric, and fixed OpenOffice.org code
Comment 4 mukund 2003-11-24 19:39:56 UTC
Created attachment 11533 [details]
Patch for OpenOffice.org 1.1.0 attached
Comment 5 mukund 2003-11-24 19:41:39 UTC
The above patch has effect on atleast the following functions:

VAR()
VARP()
STDEV()
STDEVP()
DEVSQ()

------

Results of the program with 30000, 30000.001 (hardcoded.. bah I could
have used arguments for this instead :-)

[mukund@naina mukund]$ gcc -O2 -o foo foo.c
[mukund@naina mukund]$ ./foo
Default OpenOffice.org behaviour
                           0
 
Gnumeric behaviour
             5.000000002e-07
 
Fixed OpenOffice.org behaviour
             5.000038072e-07
 
[mukund@naina mukund]$


The results are a bit different due to the different (but equivalent)
formulae being used (please see previous posts).

Comment 6 mukund 2003-11-24 21:56:45 UTC
EEK!

The patch doesn't work as intended, and it looks like there's
something very wrong with GCC's RSE optimization. As an example, look
at this file which I'm going to attach next:

This produces different results with no optimization, -O and -O2.

Comment 7 mukund 2003-11-24 21:57:13 UTC
Created attachment 11537 [details]
GCC bug?
Comment 8 mukund 2003-11-24 21:58:40 UTC
Anyways, I decided to follow what Michael Meeks said and follow the
Gnumeric method and formula.

The following patch is tested and works. It produces exactly the same
output as which Gnumeric produces.

Comment 9 mukund 2003-11-24 22:00:25 UTC
Created attachment 11538 [details]
New OpenOffice.org 1.1.0 patch which produces same results as Gnumeric
Comment 10 mukund 2003-11-24 22:06:28 UTC
Please note that the above patch only fixes the following functions:

VAR()
VARP()
STDEV()
STDEVP()
DEVSQ()

Comment 11 frank 2003-11-25 09:11:16 UTC
Hi Niklas,

one for you ?

Frank
Comment 12 mmeeks 2003-11-25 14:42:16 UTC
Hi Mukund, this is _exactly_ what we were hoping for :-)

I'm assuming that any similar problems that exist with:

LINEST, SKEW, PRODUCT, GEOMEAN, POWER

can be clobbered in the same way, and - AFAICS you may be the 1st to
collect a bounty - which would be cool :-)
Comment 13 niklas.nebel 2003-11-25 18:19:35 UTC
The second patch looks good.

ApproxSub instead of a simple difference was put there for issue 3552,
where the problem was rounding errors leading to a small negative
variance.

As long as we ensure it doesn't get negative again (which, as far as I
can see, can't happen with the patch), I think we can do away with the
ApproxSub here and accept the non-zero result for the values from
3552. Alternatively, we could use ApproxSub in the final loop, but I
don't really think that's necessary.

Note: Before we can commit any patch, we need the JCA, see
http://www.openoffice.org/contributing.html
Comment 14 mukund 2003-11-25 21:52:38 UTC
The example in issue 3552 =STDEV(-25.916128;-25.916128;-25.916128)
returns 0 with the patch applied which is the correct answer.

Comment 15 mukund 2003-11-25 22:23:55 UTC
Created attachment 11554 [details]
Patch for OpenOffice.org 1.1.0 for SKEW() and KURT()
Comment 16 mukund 2003-11-25 22:36:42 UTC
Above is a patch for SKEW() and KURT(). Both functions were patched,
as they had the same VAR()-like "incorrect" OpenOffice.org code. Both
behave like Gnumeric now.

Niklas:

I quickly ran through that PDF.. I can fax it tomorrow. The definition
of "Contribution" is vague. I want contribution to be only the code
which I _voluntarily_ submit for inclusion into OpenOffice.org.

From that PDF's content, "Contribution" can mean any code written by
me which was added to OpenOffice.org by anyone.

Please check this up for me, and if possible change the wording or
allow me to change the wording in it when I fax it back :-)

Comment 17 mukund 2003-11-25 22:46:01 UTC
I can't give a test case for the above, as I only know the formulae
used are different from the original to the patched one, and the
original formula can cause errors such as in 3552 among others.
Comment 18 mukund 2003-11-26 11:36:51 UTC
Created attachment 11561 [details]
New patch for SKEW(), KURT() and GEOMEAN()
Comment 19 mukund 2003-11-26 11:46:14 UTC
The above patch is a new patch which returns the proper error (invalid
arguments - Err502) when the standard deviation of the input arguments
is zero, which'd cause a division by zero error. This new patch just
changes the error type returned. This only applies to the SKEW() and
KURT() functions.

The existing GEOMEAN() implementation formula in OpenOffice.org is
fine. It does this basically:

GeoMean = [Prod_1..N(Xi)]^(1/N)
where Xi is an element of the set of 1..N elements.

To avoid overflow, it uses logarithms and addition instead of taking
the product of the whole set.

----

There was one glaring error in the GEOMEAN() implementation though. It
used overloading of variables of the same name and due to the scope of
the inner variable, an outer variable (count of items) was not getting
incremented when the stack type was svMatrix. I have fixed this by
using different variable names.

Comment 20 niklas.nebel 2003-11-26 13:06:11 UTC
I don't really like the idea of integrating a change without knowing
which problem it solves. You should look for cases where the new code
gives better results than the old one.

The duplicate "i" variable in ScGeoMean is unfortunate, but doesn't do
any harm, each of the loops does use the right variable. Maybe Michael
can comment on what the supposed problems with that function are?

As to the text of JCA, it has gone through legal review, so we can't
just modify it, but doesn't the "intended to be integrated" part
already cover your concerns?
Comment 21 mukund 2003-11-26 13:24:40 UTC
There is a problem in ScGeoMean() where in the switch block for the
'svMatrix' case, 'nCount' in the most immediate scope is incremented
with the statement 'nCount++;' whereas the nCount in the outer
function block has to be incremented for the correct behaviour.

Comment 22 niklas.nebel 2003-11-26 13:38:47 UTC
Oh, that one, sorry, yes, but that is already fixed on the "cac"
branch (see revision 1.8.40.2), so you can ignore it.
Comment 23 mukund 2003-11-26 13:42:18 UTC
Code "to be integrated" can mean any code written by me out of
context. For that matter, even "... Contributor has ever delivered" is
grossly implicit with the rest of the sentence. I don't know if you
follow the implication. I have run through the wording with our
solicitor here and she feels the same way.

The following is a better way of putting it (without the **** of course):

1. Contributor owns, and has sufficient rights to contribute, all
source code and related material ****which the Contributor voluntarily
intended to be compiled or integrated with the source code**** for the
OpenOffice.org open source product (the "Contribution") which
Contributor has ever delivered, and Sun has accepted, for
incorporation into the technology made available under the
OpenOffice.org open source project.

Obviously this only transfers rights in point #2, but the definition
of "Contribution" is made in #1.

I'd to get this corrected before I can sign it. I don't like the
wording, and don't want to sign anything which'll cover code outside
which I voluntarily contribute to OpenOffice.org. I take IP issues
seriously.

If this is not possible, I'm sorry but I'll have to pull out.
Comment 24 mukund 2003-11-26 13:43:10 UTC
Ah I didn't know this change happened. So as far as I know, there's no
issue with GEOMEAN() :-)


Comment 25 Martin Hollmichel 2003-11-26 15:18:25 UTC
mh->mukund: I'll take your comments upon the JCA for internal review.
Thank you for your comments, I hope I'll be avle to come back to you soon.
Comment 26 mukund 2003-11-27 22:35:58 UTC
Created attachment 11602 [details]
Test spreadsheet found on the Internet with SKEW(), KURT() and GEOMEAN() functions added
Comment 27 mukund 2003-11-27 22:37:45 UTC
Created attachment 11603 [details]
Screenshot of test spreadsheet in Gnumeric 1.2.1
Comment 28 mukund 2003-11-27 22:39:10 UTC
Created attachment 11604 [details]
Screenshot of test spreadsheet in friend's MS Excel
Comment 29 mukund 2003-11-27 22:40:03 UTC
Created attachment 11605 [details]
Screenshot of test spreadsheet in unpatched OpenOffice.org 1.1.0
Comment 30 mukund 2003-11-27 22:40:48 UTC
Created attachment 11606 [details]
Screenshot of test spreadsheet in patched OpenOffice.org 1.1.0
Comment 31 mukund 2003-11-27 22:47:53 UTC
Niklaus: If you compare those screenshots, you'll see the precision
improvements in the patched version. I cannot produce a case for
behaviour similar to issue 3552, but the original SKEW() and KURT()
functions calculate variance similar to the original VAR() function
[prior to issue 3552] and it is possible to get that kind of issue.

Michael: This basically completes the following functions:
VAR()
VARP()
STDEV()
STDEVP()
DEVSQ()
SKEW()
KURT()
GEOMEAN() // was only audited; it is fine :-)

I am going to concentrate on the rest.

Comment 32 mukund 2003-11-27 23:31:05 UTC
That VAR() patch modified DSTDEV(), DSTDEVP(), DVAR() and DVARP() too
as a side effect.
Comment 33 mukund 2003-11-28 11:35:31 UTC
I have gone through ScPow() and MatPow() functions and there do not
appear to be any bugs with them. They both use the pow() math library
function straight away. So POWER() is okay.

Gnumeric may have a _very slight_ advantage over OpenOffice.org on
some newer platforms (recent distributions of Linux; recent releases
of Solaris; no idea about Windows) as it uses the "long double" data
type where it is available. Accordingly it also uses the powl() math
function. These functions are only likely to be available on C99
platforms. I am not sure if it is a requirement of any C++
specifications and which version of the spec they were introduced in.
In any case, switching to "long double" would probably mean a complete
overhaul of most of OpenOffice.org's functions.

Note that the above is for informational purposes. The precision loss
is not significant.

Comment 34 niklas.nebel 2003-11-28 19:29:41 UTC
Those differences in the screenshots look very small. Sure, we might
use the better calculation of the variance just for correctness' sake,
but still I would like to know what the original problems with KURT
and SKEW were.
Comment 35 mmeeks 2003-12-01 18:08:36 UTC
Ok; if we can't find any problems with KURT/SKEW that's great :-)
these were more vague suspicions.

As for signing the JCA - that's a prerequisite for the bounty - which
I'm happy you've completed in spirit at least; Since Novell signed the
JCA we'd be most interested in any legal issues you've noticed - it
looked fine to us; there is a list of other signees who might have
found any obvious problems (that might damage their business); notably
Novell/Ximian, SuSE, SGI, Red Hat, Mdk, etc. have all signed it
without any probloms cf.
http://www.openoffice.org/copyright/copyrightapproved.html) - my
feeling is it's fair and reasonable.
Comment 36 mukund 2003-12-10 18:44:53 UTC
Sorry about the lack of activity here. I have been busy with my day job. Please
expect a patch for the rest of the functions by the end of this weekend.
Comment 37 Martin Hollmichel 2004-02-02 17:27:47 UTC
mh->mkund: this is the answer I got:

"The issue does not take into account the latter part of the sentence, which is
intended to qualify the Contributor's intention for the contributed code:
"...Contributor has ever delivered,...for incorporation into the technology made
available under the OpenOffice.org open source project."  The sentence must be
read as a whole.  The last part of the sentence is intended to qualify the
Contributor's intention when delivering the code for inclusion with Openoffice.org."
Comment 38 erikanderson3 2004-04-22 03:09:42 UTC
I've run into what is probably related strangeness, after reading a posting on
the OOoForum (http://www.oooforum.org/forum/viewtopic.php?t=7030).  

Open a spreadsheet, and in one column type 81 and 891.  In the next three cells,
have A1*A2, A2*A3, and A3*A4.  You should get 72171, 64304361, and
4640910037731.  In the next cell, multiply the three previous (A3*A4*A5), and in
the last cell, enter A6/1.0E11.  Make sure the final cell does not appear in
scientific notation (requiring at least one  decimal place is one way to do this).  

In the next column, we have a similar progression starting from 8.10E+021and
8.91E+022.  Keep all the formulae the same, except in the very last cell use
B6/1.0E211.  

Now comes the wierdness.  You'll note that the two values are slightly different
-- one ends in a 3, and the other in a 4.  There are no rounding issues, as
showing decimal places indicates there's not enough to round to make a
difference (nothing but zeroes five places out).  But if you then subtract one
from the other, you get 0.  ???

I'll attach a spreadsheet showing the progressions.  
Comment 39 erikanderson3 2004-04-22 03:15:18 UTC
Created attachment 14683 [details]
4 - 3 = ... 0??
Comment 40 erikanderson3 2004-04-22 03:16:32 UTC
Additional detail -- OOo 1.1.1
Comment 41 niklas.nebel 2004-04-23 09:24:33 UTC
erikanderson3: This is completely different. The difference between the numbers
is close to the limit of the numerical precision (the number of decimals you can
display is unrelated to calculation precision). In this case, 0 as a result of
subtraction is deliberate, so for example 0.1+0.2-0.3 equals zero.

mukund: Is there any progress on the subject of JCA?
Comment 42 mukund 2004-04-28 11:17:36 UTC
nn: I had faxed a signed modified copy of the JCA over to Ximian, assigning
joint-copyright for this work to Ximian, Inc. They can submit any of the patches
linked to this issue, to the OpenOffice.org project.

Comment 43 mmeeks 2004-04-28 15:46:29 UTC
mukund - I think the assign everything to Novell, and re-assign to Sun isn't
going to fly. Can you re-educate your legal department, and submit a JCA in the
normal way ? this really shouldn't be controversial
Comment 44 niklas.nebel 2004-07-23 18:16:57 UTC
Any news on this? If we can't get a JCA for the patches, I'll have to close this
issue, and we'll have to make an own implementation (not that hard, now that
it's clear what has to be done, but a pity anyway).
Comment 45 mukund 2004-07-24 00:25:58 UTC
Hi Niklaus

I am sorry I had forgotten about this open issue completely. I'll get back on
this on Monday. In any case I'll do what's necessary to have these patches
available to you for committing next week.

Mukund



Comment 46 stx123 2004-07-26 17:59:26 UTC
Hi Mukund,
we are still in need for a signed JCA from you.
The form is available from
http://www.openoffice.org/contributing.html

Thanks, Stefan
Comment 47 mukund 2004-07-27 10:10:33 UTC
Hi Niklaus, Stefan

A signed JCA has been faxed and the paper copy is on its way to you now.

Mukund

Comment 48 stx123 2004-07-27 11:54:51 UTC
While it may take some days until your name is added to the list I can confirm
that we received the JCA. Thanks, Mukund.
Greetings, Stefan
Comment 49 jpmeyers 2004-08-25 07:07:49 UTC
When I opened the test file distribution-data.xls (posted 2003-11-27) in OOo 
1.1.2 on Win X Home I get the following results:

Skew A	-0.6484780257732190
Skew B	-0.0590659837522011
Skew C	0.6621794284723160
	
Kurt A	-0.3352291010461660
Kurt B	-0.9647191301261150
Kurt C	-0.3126682758511650
	
Geomean D	0.3385367714135190

Which are different from all of the posted screenshots.  These results are 
closest to the OOo 1.1.0 unpatched image that was posted by mukund.


My interest is more along the lines of helping to develop a test file that can 
be used to test new versions of the calc program.  I envision developing a 
spreadsheet file that can self calculate the 'degree' of agreement with a 
bench mark standard.  (i.e. agrees to the benchmark within X digits)  I 
envision this testing several functions and not just the ones listed in this 
issue.  However, I believe the functions in this issue might be a good place 
to start since there is a demonstrated difference for them.  Are there others 
with known differences?

What is the status of making an official change to how these functions perform 
their calculations?
Comment 50 mmeeks 2004-08-25 15:39:51 UTC
I guess this is because the fixes never made it into cvs - they just hit
ooo-build though, so in due course your Linux distro should have them for you.
Comment 51 niklas.nebel 2004-11-04 14:55:31 UTC
Better late than never: This is now in the child workspace "calc25" and will be
part of OOo 2.0. Sorry about the delay.
The files are interpr1.cxx 1.31.2.2, interpr3.cxx 1.11.204.1.
Comment 52 niklas.nebel 2004-11-05 14:28:22 UTC
reopening to reassign
Comment 53 niklas.nebel 2004-11-05 14:28:37 UTC
reassigning to QA for verification
Comment 54 frank 2004-11-08 13:15:49 UTC
resett fixed
Comment 55 frank 2004-11-09 12:14:33 UTC
found fixed using cws calc25 on Linux, Solaris and Windows
Comment 56 oc 2004-12-06 13:45:08 UTC
closed because fix available in OOo1.9m65
Comment 57 frank 2005-01-19 09:30:20 UTC
*** Issue 40833 has been marked as a duplicate of this issue. ***