Apache OpenOffice (AOO) Bugzilla – Issue 104430
Modularize Calc's expression evaluator to support alternate arithmetics
Last modified: 2009-12-16 03:23:52 UTC
Please consider modularizing Calc's expression evaluator and documenting the resulting interface. It would be desirable to move away from binary floating point arithmetic for spreadsheet calculations for the reasons mentioned in http://speleotrove.com/decimal/. (It would also be nice to be able to adjust the precision with which calculations are done, and implement alternate semantics such as interval arithmetic, but I digress.) This is, of course, practicably impossible, given the conflicting goals of maintaining approximate compatibility with older spreadsheet versions, various macro & programming APIs, plug-ins, and an infamous spreadsheet from a Redmond, Washington company. However, if we modularize things, we can replace the expression evaluator to enable new semantics later on. Right now, all of the spreadsheet values are passed back and forth internally as IEEE 754 double precision binary floats. Cosmetic rounding to suppress binary floating point artifacts is performed after addition and subtraction (maybe others) by flushing answers close to zero to zero. This is not documented anywhere that I have found. (It's probably a better policy than the one cooked up in Redmond. Refer to some of William Kahan's papers on the topic.) However, it still creates a chasm between zero and the nearest nonzero numbers. (For reasons why this is bad, refer to the discussions surrounding the denormalized values in IEEE 754, especially with the new revision. Flush-to-zero is wonderful if you're just doing graphics for a game, but deadly if the numbers actually mean something. This isn't quite the same issue, but it has the same effect.) There is a check-box in "Options" to have the string value of the spreadsheet cell reparsed for use as an input to calculations that refer to it instead of using a cached binary floating point value, which sort of addresses this issue, but (1) it will always suffer from double rounding, which makes it break some corner cases when doing modular arithmetic and (2) it can't do any better than double precision binary floating point, and (3) it gets in the way of formatting a spreadsheet to carry all the decimal places when you're trying to simulate that kind of thing. If the expression evaluator could be moved to a .DLL/.so, and the API generalized to pass a string or general-purpose data structure (instead of just a binary double-precision float) (and documented thus), then it could be replaced in a separate project with a tool that would provide expression evaluation using decimal semantics, rational semantics, or otherwise modified semantics. In order to actually make this doable in polynomial time, I suggest replacing the internal representation of the double-precision cell contents value with a string that will initially just hex-encode (or maybe octal would be faster) the IEEE 754 double bitfields (and treat them as opaque outside the expression evaluator). Make all the conversions to and from and all calculations the responsibility of the expression evaluator .DLL/.so, so that everything else outside the expression evaluator .DLL/.so treats the encoded float as opaque. All of the standard spreadsheet formulas would be the responsibility of the expression evaluator (as they depend on the format of the floating point numbers used). With the expression evaluation mechanism parameterized out of the rest of the spreadsheet module, it would be possible to create alternate expression evaluators that did not depend of having everything reduced to a common binary floating point representation, which would in turn permit representations that carry more precision or follow different rounding semantics. Ultimately, we're going to have to move spreadsheets away from binary floating point, and this will be true of all spreadsheets, not just OpenOffice. People expect their calculators to follow the rules that they learned in school, and that's why calculator universally use decimal representation internally. They also expect spreadsheets to follow the rules, but for performance reasons about 20 years ago, we needed to use binary floating point, and we've been stuck with it ever since. We're not facing performace issues of the type we faced 20 years ago. There's no reason to continue to inflict this on ourselves.
I will set this to "wontfix". That does not mean, that it cannot be done. But there is nobody to do it. See also issue 54078. If a group of developers exists, who will work on it, the issue can be reopened. But bear in mind, that Calc is a spreadsheet application in an office suite. In my opinion it is the wrong tool for those who need such precision with ensured accuracy. BTW, the current internal structure does not hinder a new number format. It is possible to define function which takes the cell content as text and perform all necessary transformations themselves and also integrating such solutions as addin is already possible.
wontfix -< closed
Another approach is in issue 107681.