:date: 2024-03-14 16:05 :tags:

If you saw my recent post about rounding I came to the conclusion that the whole topic was very complicated and we should trust that the nerds who create this foundational stuff know what they're doing and try not to worry about it.

Well, I did my best to not worry about it but after some feedback from my very observant friend Dr. S, I realized that my efforts to not worry about it were not going to hold up. You see, the problem that was brought to my attention was something I had kind of noticed. But I was trying to trust the creators of Python (and C, etc.) and assume they knew how to not let everybody down. But when it was pointed out by someone else, well, yes, my story made no sense.

Basically I was trying to demonstrate that Python does not use a
**round up** strategy to handle values half way between candidate
rounded values. Rounding to the higher value was what I was taught in
school a young kid.
Python
documentation hints that it likes to **round to even** values. Perhaps
that's true when rounding to whole integers. But the examples I chose
did not reflect that. I have edited one of the values in that post so
that the story makes some sense, but let's revisit the original flawed
values and understand what is really going on and why it could be more
important than I first thought.

Here's the Python version of my previous example.

```
>>> round(5.775,2); round(5.475,2)
5.78
5.47
```

Ok, the 5.775 would round up to the $5.78 since 8 is the even
possibility. But what about 5.475? I have come to appreciate that the
**round to even** rule is perfectly sensible, and if was in effect
wouldn't we expect to see 5.475 round to $5.48 since again the 8 is
even? But it rounds down to the 7. I did kind of notice this but was
gaslighted into hazily thinking it must be the preceding digit
somehow. Well, that is of course nonsense. This example is simply not
following any rule that I've discussed. I actually think it is not
really following any *rounding* (tie breaking) rule at all!

After experimenting with rounding well beyond what I was hoping to
spend my time on, I never could discern any kind of heuristic that
would allow one to predict the outcome. And that is weird. What I'm
saying is that **a simple sales receipt tax calculation will not be
something you can properly predict** and that was deeply surprising to
me.

But it's a computer; surely there is some determinism! Surely there is a reason it does what it does. After reading part one some people did have the vague idea this had to do with encoding floating point numbers as binary sequences. Those people were 100% right. I have to say that I was surprised because while I'm quite familiar with the massive can of worms involved in this topic (e.g. IEEE 754) I didn't think it would apply to the (e.g. dollars and cents) accuracies I was interested in. But that thinking seems to be strangely wrong and naive. Even if the ability to represent (the least accurate type of) floating point numbers is inherently cursed to have some error of say, 3 parts per billion, what does that have to do with pennies in a dollar? Amazingly it matters, and pretty much always.

I wanted a very close look at how `float`

numbers are actually used in
C (Python and all the rest seem related). Let's consider the two
numbers **0.513245** and **0.513255**. If I want those six digit numbers to
start as exactly those values and I then would like them to later be
made into 5 digit numbers with rounding, what would be done?

If you're using the grade school algorithm of **always round up** when
there is a tie, you get this.

```
0.513245 --> 0.51325
0.513255 --> 0.51326
```

That's the easy way to think about it. What if we want to use the
**round to even** heuristic? We should get this.

```
0.513245 --> 0.51324 (round down because final 4 is even)
0.513255 --> 0.51326 (round up because final 6 is even)
```

That is not what I observed. It turns out that this explanation does
not help us understand anything! What *really* is happening?

With some helpful help from my robot friends I wrote a C program that
really dissects the process by iterating over (some limited range) of
*all* the values of floating point numbers that are possible. I'm
basically incrementing my target number by the
machine epsilon which
is (something like) the smallest value difference that a computer can
discern when representing floating point numbers.

Take a look at this table which I explain in detail below.

```
0.513245 - 0.51324 - 0.513244509696960449218750000000 - 00111111000000110110001111111110
0.513245 - 0.51324 - 0.513244569301605224609375000000 - 00111111000000110110001111111111
0.513245 - 0.51324 - 0.513244628906250000000000000000 - 00111111000000110110010000000000
0.513245 - 0.51324 - 0.513244688510894775390625000000 - 00111111000000110110010000000001
0.513245 - 0.51324 - 0.513244748115539550781250000000 - 00111111000000110110010000000010
0.513245 - 0.51324 - 0.513244807720184326171875000000 - 00111111000000110110010000000011
0.513245 - 0.51324 - 0.513244867324829101562500000000 - 00111111000000110110010000000100
0.513245 - 0.51324 - 0.513244926929473876953125000000 - 00111111000000110110010000000101
0.513245 - 0.51324 - 0.513244986534118652343750000000 - 00111111000000110110010000000110
0.513245 - 0.51325 - 0.513245046138763427734375000000 - 00111111000000110110010000000111
0.513245 - 0.51325 - 0.513245105743408203125000000000 - 00111111000000110110010000001000
0.513245 - 0.51325 - 0.513245165348052978515625000000 - 00111111000000110110010000001001
0.513245 - 0.51325 - 0.513245224952697753906250000000 - 00111111000000110110010000001010
0.513245 - 0.51325 - 0.513245284557342529296875000000 - 00111111000000110110010000001011
0.513245 - 0.51325 - 0.513245344161987304687500000000 - 00111111000000110110010000001100
0.513245 - 0.51325 - 0.513245403766632080078125000000 - 00111111000000110110010000001101
0.513245 - 0.51325 - 0.513245463371276855468750000000 - 00111111000000110110010000001110
0.513255 - 0.51325 - 0.513254523277282714843750000000 - 00111111000000110110010010100110
0.513255 - 0.51325 - 0.513254582881927490234375000000 - 00111111000000110110010010100111
0.513255 - 0.51325 - 0.513254642486572265625000000000 - 00111111000000110110010010101000
0.513255 - 0.51325 - 0.513254702091217041015625000000 - 00111111000000110110010010101001
0.513255 - 0.51325 - 0.513254761695861816406250000000 - 00111111000000110110010010101010
0.513255 - 0.51325 - 0.513254821300506591796875000000 - 00111111000000110110010010101011
0.513255 - 0.51325 - 0.513254880905151367187500000000 - 00111111000000110110010010101100
0.513255 - 0.51325 - 0.513254940509796142578125000000 - 00111111000000110110010010101101
0.513255 - 0.51326 - 0.513255000114440917968750000000 - 00111111000000110110010010101110
0.513255 - 0.51326 - 0.513255059719085693359375000000 - 00111111000000110110010010101111
0.513255 - 0.51326 - 0.513255119323730468750000000000 - 00111111000000110110010010110000
0.513255 - 0.51326 - 0.513255178928375244140625000000 - 00111111000000110110010010110001
0.513255 - 0.51326 - 0.513255238533020019531250000000 - 00111111000000110110010010110010
0.513255 - 0.51326 - 0.513255298137664794921875000000 - 00111111000000110110010010110011
0.513255 - 0.51326 - 0.513255357742309570312500000000 - 00111111000000110110010010110100
0.513255 - 0.51326 - 0.513255417346954345703125000000 - 00111111000000110110010010110101
0.513255 - 0.51326 - 0.513255476951599121093750000000 - 00111111000000110110010010110110
```

Column one is the target number I'm interested in, the perfect value I
want the computer to keep track of. Column two is that exact same
number as filtered through a standard C style `%.5f`

rounding filter
used by `printf`

— in other words, C's normal attempt to round the
input value down by one digit. Now it gets interesting. The third
column is what the value looks like to C as a human number when I ask
it to show me all of its cards. I want to know all that it knows about
what it thinks this number is. This is done by asking for 30 decimal
places with `%.30f`

. And you can see that there is all kinds of
clutter there! Those are artifacts — necessary compromises — related
to encoding human numbers in a computer. The trailing zeros tell us I
have shown all of what it knows. Ok, if the actual digits out to 25
places are kind of gibberish, how exactly did they get that way? The
answer to that is how they were encoded using 1s and 0s. Column four
shows the exact binary value that encodes this number. If you kind of
know how counting in binary numbers work, you can verify that I must
have picked a machine epsilon that was pretty close allowing us to
review a slice of the sequence of every binary number possible.

Remember I am interested in 0.513245 and 0.513255 but those were not
the input. I hunted those out of the results of all of the possible
(six digit) numbers with grep. These values are completely typical.
What this shows us is that for each of those numbers I have discovered
**17** different ways that they could be encoded. It is important to
note that none of the binary encodings *perfectly* match our target
numbers. The closest that binary numbers can achieve is around the
middle of each list and it is exactly there that you can see the
second column making the switch from rounding *down* on 5 values, to
rounding *up* on 5 values.

This brings up a very good question — which of these binary encodings will the system choose for our target number of 0.513255? The closest two candidates are repeated below.

```
0.513255 - 0.51325 - 0.513254940509796142578125000000 - 00111111000000110110010010101101
0.513255 - 0.51326 - 0.513255000114440917968750000000 - 00111111000000110110010010101110
```

It turns out that the answer to this question makes the decision to whether the system rounds up or rounds down.

```
0.513255 - 0.51325 - 0.51325-->4940509796142578125000000<-- Less than 5, rounds down
0.513255 - 0.51326 - 0.51325-->5000114440917968750000000<-- Greater than 5, rounds up
```

Ok, maybe you're following along, maybe not, but the question remains,
how can we *predict* the behavior of all these rounding systems on
floating point numbers?

As far as I can tell, the answer is **it is not possible to predict
floating point behavior**. Not for programmers. Obviously it's going to
be complicated, but it's probably worse than you think. For example,
if you ask your robot friends to explain "how (gcc) compiler
optimizations can affect floating point accuracy" — oh boy! It's
ugly! (Reordering operations, vectorization for SIMD, FMA — Fused
Multiply Add, etc...)

The Python documentation does have a "note" that warns, "**The behavior
of round() for floats can be surprising...**" I feel that's somewhat of
an error of omission. I feel like it would be more correct to say
"**the behavior of round() for floats is essentially a pseudo random
number generator**".

All my programming career I just figured that this was in the nether
realms of part per billion accuracies I don't care about. It's not
every day I have to, for example, work in meter accuracies all the way
to Mars. But what this investigation has taught me is that floating
point numbers really can be poison. My extremely simple example of
calculating sales tax is something most humans can understand, but I
sure underestimated how hard it would be to get a computer to
*properly* understand it.

If you do ever need to write software that gets simple things like
sales tax right, check out
Python's standard
`decimal`

module which provides a numeric type which is superior to
`float`

when struggling with these issues.

Now that I've learned more than I wanted to know about computer rounding, I'm going to go back to not thinking about it too much. The whole reason this came up is that I have just added a round function to my programming language (by simply borrowing Python's) and now at least I know what to expect out of it.

I still am tempted to write another rounding function which I have
traditionally used manually. I call it **athlete rounding** and it
basically rounds up if it's speed (kph) and down if the value is a
time/distance pace (min/mile). But no, no, I will learn to make peace
with Python and C's random rounding behavior.

For reference, here's the C program I used to generate those bit representations.

[source,c]

```
/* bits.c - Demonstration of C's float encoding system. */
#include <stdio.h>
void printBits(size_t const size, void const * const ptr) {
unsigned char *b = (unsigned char*) ptr;
unsigned char byte;
int i,j;
for (i=size-1;i>=0;i--) {
for (j=7;j>=0;j--) {
byte= (b[i] >> j) & 1;
printf("%u", byte);
}
}
}
int main() {
float E= 3e-8; // Epsilon.
for (float x=0.5;x<1;x+=E){
printf("%f - %.5f - %.30f - ",x,x);
printBits(sizeof(x), &x);
printf("\n");
}
return 0;
}
```