20191014

You can do basic math in Python, right?


Let's do a small quiz. I've pick an example of basic math. I would like to demonstrate how limited our knowledge about the types we use is. If you are a floating point expert you can feel good that you score 100%. For the 99,99% of developers enjoy the ride.
On the way I point out heuristics I find important to consider while developing code. I think those make success more likely.
def div(a, b):
    return a / b
 
assert div(1,1) == 1
This is the code I would like to test. Looks pretty simple: we divide a by b. We also have a test.
Test branch coverage says we're done.
A looks good or did we miss something?
Yes, there is more than 1 float value.

How big is the problem space?

1<<(64 + 64) # I think, haven't read the IEEE standard to give the correct number
Heuristic: Reading documentation is better than not reading documentation.
If we want to tackle a problem we have to know how big it is. If the surface area is small it's easier to know all valid states. For example testing a function that takes a boolean value is easier than a function that takes an integer value.

How bad is that?

from datetime import timedelta
pflop = 1<<50 200pf="" 2018="" 9.7="" comparison:="" ibm="" mw="" nov="" span="" summit="" supercomputer="" with="">
problem = 1<<(64+64)
 
billion_years_with_1_pflop = problem / (timedelta(days=365).total_seconds() * pflop) / (10**9)
billion_years_with_1_pflop
#Pretty bad
9583696.5659455
Heuristic: Use brute-force if possible.
In this instance we cannot test all values with brute-force. Testing exhaustively should always be the first approach. If this doesn't work we have to use our knowledge of floating point arithmetic to produce software that is good enough for the context it's used in.

Domain Knowledge

If you doing anything other than scientific computation, float is the incorrect type.
For other applications decimal type, or other types (date, complex, etc.) are appropriate.
Heuristic: Know your domain.
Depending on your domain, you're already doomed.
If you doing anything other than scientific computations with float it is the incorrect type. It won't work for your currencies, interest rate and book keeping calculations. Your those you would use a decimal type.
Now let's start the quiz to see how good our float knowledge is.

This should be true, right?

In the quiz we test the div function from before with the assert_div function. This test succeeds if the first parameter a is equal to dividing and then multiplying by b.
def assert_div(a, b):
    assert div(a,b) * b == a
assert_div(1,1)
assert_div(2,3)
This is true for a lot of float values but not all.
Can you come up with 1-2 examples where this is false?

The Quiz

It's not such a hard quiz really. I'll just give you examples and we try to explain why it's not working.
assert_div(1,0) #Let's start simple
---------------------------------------------------------------------------
 
ZeroDivisionError                         Traceback (most recent call last)
 
 in ()
----> 1 assert_div(1,0) #Let's start simple
 
 
 in assert_div(a, b)
      1 def assert_div(a, b):
----> 2     assert div(a,b) * b == a
 
 
 in div(a, b)
      1 def div(a, b):
----> 2     return a / b
      3 
      4 assert div(1,1) == 1
 
 
ZeroDivisionError: division by zero
This example is simple: division by zero. This is the same as in normal math and not specific to floating point. 

Not a number is fun

Floating point introduces Nan to represent all values that cannot be represented by any other value.
NaN is pretty special. You cannot treat it as any other value. Any operation with NaN returns NaN.
assert_div(float(1), float("nan"))
---------------------------------------------------------------------------
 
AssertionError                            Traceback (most recent call last)
 
 in ()
----> 1 assert_div(float(1), float("nan"))
 
 
 in assert_div(a, b)
      1 def assert_div(a, b):
----> 2     assert div(a,b) * b == a
 
 
AssertionError: 
float(1) / float("nan")
nan

Not a number is even funnier

assert_div(float("nan"), 1.0)
---------------------------------------------------------------------------
 
AssertionError                            Traceback (most recent call last)
 
 in ()
----> 1 assert_div(float("nan"), 1.0)
 
 
 in assert_div(a, b)
      1 def assert_div(a, b):
----> 2     assert div(a,b) * b == a
 
 
AssertionError: 
Given what we saw in the last test, this test looks like it could work. It doesn't because NaN equal to itself. 
n = float("nan")
(n == n, n is n)
(False, True)
This is actually pretty odd. https://docs.python.org/3.7/reference/expressions.html
Equality comparison should be reflexive. In other words, identical objects should compare equal:x is y implies x == y
By flipping the arguments to div we get the result NaN. NaN is a weird type that even though is the same memory address we still do not consider it the same object.

Inifinity is also fun

Infinity and negative infinity support to represent values smaller and larger than a regular value float.
assert_div(float("inf"), float("inf"))
---------------------------------------------------------------------------
 
AssertionError                            Traceback (most recent call last)
 
 in ()
----> 1 assert_div(float("inf"), float("inf"))
 
 
 in assert_div(a, b)
      1 def assert_div(a, b):
----> 2     assert div(a,b) * b == a
 
 
AssertionError: 
float("inf")/float("inf")
nan

Losing it

assert_div(3e-5, 7)
---------------------------------------------------------------------------
 
AssertionError                            Traceback (most recent call last)
 
 in ()
----> 1 assert_div(3e-5, 7)
 
 
 in assert_div(a, b)
      1 def assert_div(a, b):
----> 2     assert div(a,b) * b == a
 
 
AssertionError: 
Here we lose precision, which again leads to the input value not machting the result.
3e-5 / 7 * 7 
2.9999999999999997e-05

Big and small

import sys
assert_div(sys.float_info.max, sys.float_info.min)
---------------------------------------------------------------------------
 
AssertionError                            Traceback (most recent call last)
 
 in ()
      1 import sys
----> 2 assert_div(sys.float_info.max, sys.float_info.min)
 
 
 in assert_div(a, b)
      1 def assert_div(a, b):
----> 2     assert div(a,b) * b == a
 
 
AssertionError: 
In this instance the division produces infinity and there's no way to go back from infity to a.
sys.float_info.max / sys.float_info.min
inf

Small and big

assert_div(sys.float_info.min, sys.float_info.max)
---------------------------------------------------------------------------
 
AssertionError                            Traceback (most recent call last)
 
 in ()
----> 1 assert_div(sys.float_info.min, sys.float_info.max)
 
 
 in assert_div(a, b)
      1 def assert_div(a, b):
----> 2     assert div(a,b) * b == a
 
 
AssertionError: 
Something similar happens if we flip the input values. The division leads to 0 and we cannot go back to a.
sys.float_info.min / sys.float_info.max
0.0

Precisely wrong

assert_div((1<<64 1="" span="">
---------------------------------------------------------------------------
 
AssertionError                            Traceback (most recent call last)
 
 in ()
----> 1 assert_div((1<<64 1="" span="">
 
 
 in assert_div(a, b)
      1 def assert_div(a, b):
----> 2     assert div(a,b) * b == a
 
 
AssertionError: 
This is a more interesting problem. Float's internal representation doesn't allow to represent all 1<<64 span="">integer values in a 64 bit float value. In this example (1<<64 span=""> is the same as (1<<64 span="">.
( (1<<64 float="" int="" span="">
(18446744073709551615, 18446744073709551616)
_[1]-_[0]
1

Bonus round of weirdness

If this wasn't bad enough. There is additional weirdness that is implementation specific. You'll not find this by reading the documentation.
Heuristic: Test small values exhaustively. Test with random input.
ma = sys.float_info.max * 1.1
mi = sys.float_info.min / 10
 
(ma, mi)
(inf, 2.225073858507203e-309)
mi < sys.float_info.min
True
The upper and lower bound are not treated the same. You can have a min value that is smaller than the value returned by float_info.

Wrap up

How did you do in the quiz?
If you can use an int don't use an float. You will avoid a whole class of problems.
Heuristic: Simpler software is better than complex software.
How did you do? I'm pretty sure most developers struggle with this quiz. I'm one of them. We work casually with complexity. We use concepts without thoroughly understanding them. This is true for nearly everything in software as we're building layer over layer of software.
In general this is a good thing, it's the only way we know of how to use the hardware we have. On the other hand, we should be conservative in estimating our software quality. It's probably way worse than we think. The only way out is to make the implementation as simple as possible.