Issue 104430 - Modularize Calc's expression evaluator to support alternate arithmetics
Summary: Modularize Calc's expression evaluator to support alternate arithmetics
Status: CLOSED WONT_FIX
Alias: None
Product: Calc
Classification: Application
Component: programming (show other issues)
Version: OOO310m15
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: spreadsheet
QA Contact: issues@sc
URL: http://speleotrove.com/decimal/
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2009-08-22 17:35 UTC by jeffrey_s
Modified: 2009-12-16 03:23 UTC (History)
3 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description jeffrey_s 2009-08-22 17:35:13 UTC
Please consider modularizing Calc's expression evaluator and documenting the
resulting interface.

It would be desirable to move away from binary floating point arithmetic for
spreadsheet calculations for the reasons mentioned in
http://speleotrove.com/decimal/.  (It would also be nice to be able to adjust
the precision with which calculations are done, and implement alternate
semantics such as interval arithmetic, but I digress.)

This is, of course, practicably impossible, given the conflicting goals of
maintaining approximate compatibility with older spreadsheet versions, various
macro & programming APIs, plug-ins, and an infamous spreadsheet from a Redmond,
Washington company.  However, if we modularize things, we can replace the
expression evaluator to enable new semantics later on.

Right now, all of the spreadsheet values are passed back and forth internally as
IEEE 754 double precision binary floats.  Cosmetic rounding to suppress binary
floating point artifacts is performed after addition and subtraction (maybe
others) by flushing answers close to zero to zero.  This is not documented
anywhere that I have found.  (It's probably a better policy than the one cooked
up in Redmond.  Refer to some of William Kahan's papers on the topic.)  However,
it still creates a chasm between zero and the nearest nonzero numbers.  (For
reasons why this is bad, refer to the discussions surrounding the denormalized
values in IEEE 754, especially with the new revision.  Flush-to-zero is
wonderful if you're just doing graphics for a game, but deadly if the numbers
actually mean something.  This isn't quite the same issue, but it has the same
effect.)

There is a check-box in "Options" to have the string value of the spreadsheet
cell reparsed for use as an input to calculations that refer to it instead of
using a cached binary floating point value, which sort of addresses this issue,
but (1) it will always suffer from double rounding, which makes it break some
corner cases when doing modular arithmetic and (2) it can't do any better than
double precision binary floating point, and (3) it gets in the way of formatting
a spreadsheet to carry all the decimal places when you're trying to simulate
that kind of thing.

If the expression evaluator could be moved to a .DLL/.so, and the API
generalized to pass a string or general-purpose data structure (instead of just
a binary double-precision float) (and documented thus), then it could be
replaced in a separate project with a tool that would provide expression
evaluation using decimal semantics, rational semantics, or otherwise modified
semantics.  

In order to actually make this doable in polynomial time, I suggest replacing
the internal representation of the double-precision cell contents value with a
string that will initially just hex-encode (or maybe octal would be faster) the
IEEE 754 double bitfields (and treat them as opaque outside the expression
evaluator).  Make all the conversions to and from and all calculations the
responsibility of the expression evaluator .DLL/.so, so that everything else
outside the expression evaluator .DLL/.so treats the encoded float as opaque. 
All of the standard spreadsheet formulas would be the responsibility of the
expression evaluator (as they depend on the format of the floating point numbers
used).

With the expression evaluation mechanism parameterized out of the rest of the
spreadsheet module, it would be possible to create alternate expression
evaluators that did not depend of having everything reduced to a common binary
floating point representation, which would in turn permit representations that
carry more precision or follow different rounding semantics.

Ultimately, we're going to have to move spreadsheets away from binary floating
point, and this will be true of all spreadsheets, not just OpenOffice.  People
expect their calculators to follow the rules that they learned in school, and
that's why calculator universally use decimal representation internally.  They
also expect spreadsheets to follow the rules, but for performance reasons about
20 years ago, we needed to use binary floating point, and we've been stuck with
it ever since.  We're not facing performace issues of the type we faced 20 years
ago.  There's no reason to continue to inflict this on ourselves.
Comment 1 Regina Henschel 2009-08-22 19:11:33 UTC
I will set this to "wontfix". That does not mean, that it cannot be done. But
there is nobody to do it. See also issue 54078. If a group of developers exists,
who will work on it, the issue can be reopened. But bear in mind, that Calc is a
spreadsheet application in an office suite. In my opinion it is the wrong tool
for those who need such precision with ensured accuracy.

BTW, the current internal structure does not hinder a new number format. It is
possible to define function which takes the cell content as text and perform all
 necessary transformations themselves and also integrating such solutions as
addin is already possible.





Comment 2 Mechtilde 2009-09-04 11:52:38 UTC
wontfix -< closed
Comment 3 nicklevinson 2009-12-16 03:23:52 UTC
Another approach is in issue 107681.