Floating Point Reference

“Fractions are your friends.” This was my high school algebra teacher’s standard response when students complained about the sometimes tedious math that they could require. Many years later, I’m finding that while floating point math, while not tedious, is still rather more involved than it appears at first; here’s my attempt to summarize issues that may come up.

Floating point data types

In C on an x86 system:

Selecting a floating point data type

See this portion of another post.

Floating point storage

This is covered on many sites; the most concise and thorough description I’ve found is in chapter 2 of Sun’s Numerical Computation Guide.

NaNs and Infinity

See my previous post.

Comparing floating point numbers

A simple equality comparison (such as a == b) will often fail, even for values which you would expect to be equal, due to rounding errors (especially rounding introduced by converting between binary and decimal). (Do any compilers or static code analysis tools emit warnings if you try to do a naive equality comparison?)

Greater than / less than comparisons generally require no special handling, although at the assembly level, a comparison may return a result of “unordered” if NaNs are involved, and one of the compilers I tested (CodeGear C++Builder) fails to account for this and so may return incorrect results when comparing NaNs.

Handling floating point exceptions

See this post.

Low-level floating point calculations

If you need to do floating point work at the assembly level, use Intel Software Developer’s Manuals as a reference. Volume 1 has some background on the FPU; volume 2A contains most floating point instructions (since they start with F).

Example code

Rather than simple math, the following code covers manipulating floating point numbers’ bit representations, handling NaNs and infinities, and so on.

For further reading

In no particular order…

EDIT: (3/15/2009) Added sections on “Selecting a floating point data type” and “Handling floating point exceptions.”