Let's do a small quiz. I've pick an example of basic math. I would like to demonstrate how limited our knowledge about the types we use is. If you are a floating point expert you can feel good that you score 100%. For the 99,99% of developers enjoy the ride.

On the way I point out heuristics I find important to
consider while developing code. I think those make success more likely.

`def div(a, b):`

` return a / b`

` `

`assert div(1,1) == 1`

This is the code I would like to test. Looks pretty
simple: we divide a by b. We also have a test.

Test branch coverage says we're done.

A looks good or did we miss something?

Yes, there is more than 1 float value.

## How big is the problem space?

`1<<(64 + 64) # I think, haven't read the IEEE standard to give the correct number`

**Heuristic: Reading documentation is better than not reading documentation.**

If we want to tackle a problem we have to know how big
it is. If the surface area is small it's easier to know all valid states. For
example testing a function that takes a boolean value is easier than a function
that takes an integer value.

#### How bad is that?

`from datetime import timedelta`

`pflop = 1<<50 200pf="" 2018="" 9.7="" comparison:="" ibm="" mw="" nov="" span="" summit="" supercomputer="" with="">`

`problem = 1<<(64+64)`

` `

`billion_years_with_1_pflop = problem / (timedelta(days=365).total_seconds() * pflop) / (10**9)`

`billion_years_with_1_pflop`

`#Pretty bad`

`9583696.5659455`

**Heuristic: Use brute-force if possible.**

In this instance we cannot test all values with
brute-force. Testing exhaustively should always be the first approach. If this
doesn't work we have to use our knowledge of floating point arithmetic to
produce software that is good enough for the context it's used in.

# Domain Knowledge

If you doing anything other than

**scientific computation**, float is the incorrect type.
For other applications

**decimal**type, or other types (date, complex, etc.) are appropriate.**Heuristic: Know your domain.**

Depending on your domain, you're already doomed.

If you doing anything other than scientific
computations with float it is the incorrect type. It won't work for your

**currencies, interest rate and book keeping calculations**. Your those you would use a decimal type.
Now let's start the quiz to see how good our float
knowledge is.

# This should be true, right?

In the quiz we test the

`div`

function from before with the `assert_div`

function. This test succeeds if the first parameter `a`

is equal to dividing and then multiplying by `b`

.`def assert_div(a, b):`

` assert div(a,b) * b == a`

`assert_div(1,1)`

`assert_div(2,3)`

This is true for a lot of float values but not all.

Can you come up with 1-2 examples where this is false?

# The Quiz

It's not such a hard quiz really. I'll just give you
examples and we try to explain why it's not working.

`assert_div(1,0) #Let's start simple`

`---------------------------------------------------------------------------`

` `

`ZeroDivisionError Traceback (most recent call last)`

` `

in ()

`----> 1 assert_div(1,0) #Let's start simple`

` `

` `

in assert_div(a, b)

` 1 def assert_div(a, b):`

`----> 2 assert div(a,b) * b == a`

` `

` `

in div(a, b)

` 1 def div(a, b):`

`----> 2 return a / b`

` 3 `

` 4 assert div(1,1) == 1`

` `

` `

`ZeroDivisionError: division by zero`

This example is simple: division by zero. This is the
same as in normal math and not specific to floating point.

# Not a number is fun

Floating point introduces Nan to represent all values
that cannot be represented by any other value.

NaN is pretty special. You cannot treat it as any
other value. Any operation with NaN returns NaN.

`assert_div(float(1), float("nan"))`

`---------------------------------------------------------------------------`

` `

`AssertionError Traceback (most recent call last)`

` `

in ()

`----> 1 assert_div(float(1), float("nan"))`

` `

` `

in assert_div(a, b)

` 1 def assert_div(a, b):`

`----> 2 assert div(a,b) * b == a`

` `

` `

`AssertionError: `

`float(1) / float("nan")`

`nan`

# Not a number is even funnier

`assert_div(float("nan"), 1.0)`

`---------------------------------------------------------------------------`

` `

`AssertionError Traceback (most recent call last)`

` `

in ()

`----> 1 assert_div(float("nan"), 1.0)`

` `

` `

in assert_div(a, b)

` 1 def assert_div(a, b):`

`----> 2 assert div(a,b) * b == a`

` `

` `

`AssertionError: `

Given what we saw in the last test, this test looks
like it could work. It doesn't because NaN equal to itself.

`n = float("nan")`

`(n == n, n is n)`

`(False, True)`

This is actually pretty odd.
https://docs.python.org/3.7/reference/expressions.html

Equality comparison should be reflexive. In other
words, identical objects should compare equal:

`x is y implies x == y`

By flipping the arguments to

`div`

we get the result `NaN`

. NaN is a weird type that even
though is the same memory address we still do not consider it the same object.# Inifinity is also fun

Infinity and negative infinity support to represent
values smaller and larger than a regular value float.

`assert_div(float("inf"), float("inf"))`

`---------------------------------------------------------------------------`

` `

`AssertionError Traceback (most recent call last)`

` `

in ()

`----> 1 assert_div(float("inf"), float("inf"))`

` `

` `

in assert_div(a, b)

` 1 def assert_div(a, b):`

`----> 2 assert div(a,b) * b == a`

` `

` `

`AssertionError: `

`float("inf")/float("inf")`

`nan`

# Losing it

`assert_div(3e-5, 7)`

`---------------------------------------------------------------------------`

` `

`AssertionError Traceback (most recent call last)`

` `

in ()

`----> 1 assert_div(3e-5, 7)`

` `

` `

in assert_div(a, b)

` 1 def assert_div(a, b):`

`----> 2 assert div(a,b) * b == a`

` `

` `

`AssertionError: `

Here we lose precision, which again leads to the input
value not machting the result.

`3e-5 / 7 * 7 `

`2.9999999999999997e-05`

# Big and small

`import sys`

`assert_div(sys.float_info.max, sys.float_info.min)`

`---------------------------------------------------------------------------`

` `

`AssertionError Traceback (most recent call last)`

` `

in ()

` 1 import sys`

`----> 2 assert_div(sys.float_info.max, sys.float_info.min)`

` `

` `

in assert_div(a, b)

` 1 def assert_div(a, b):`

`----> 2 assert div(a,b) * b == a`

` `

` `

`AssertionError: `

In this instance the division produces infinity and
there's no way to go back from infity to a.

`sys.float_info.max / sys.float_info.min`

`inf`

# Small and big

`assert_div(sys.float_info.min, sys.float_info.max)`

`---------------------------------------------------------------------------`

` `

`AssertionError Traceback (most recent call last)`

` `

in ()

`----> 1 assert_div(sys.float_info.min, sys.float_info.max)`

` `

` `

in assert_div(a, b)

` 1 def assert_div(a, b):`

`----> 2 assert div(a,b) * b == a`

` `

` `

`AssertionError: `

Something similar happens if we flip the input values.
The division leads to 0 and we cannot go back to a.

`sys.float_info.min / sys.float_info.max`

`0.0`

# Precisely wrong

`assert_div((1<<64 1="" span="">`

`---------------------------------------------------------------------------`

` `

`AssertionError Traceback (most recent call last)`

` `

in ()

`----> 1 assert_div((1<<64 1="" span="">`

` `

` `

in assert_div(a, b)

` 1 def assert_div(a, b):`

`----> 2 assert div(a,b) * b == a`

` `

` `

`AssertionError: `

This is a more interesting problem.
Float's internal representation doesn't allow to represent all

`1<<64 span="">`

integer values in a 64 bit float value. In this
example `(1<<64 span="">`

is the same
as `(1<<64 span="">`

.`( (1<<64 float="" int="" span="">`

`(18446744073709551615, 18446744073709551616)`

`_[1]-_[0]`

`1`

# Bonus round of weirdness

If this wasn't bad enough. There is additional
weirdness that is implementation specific. You'll not find this by reading the
documentation.

**Heuristic: Test small values exhaustively. Test with random input.**

`ma = sys.float_info.max * 1.1`

`mi = sys.float_info.min / 10`

` `

`(ma, mi)`

`(inf, 2.225073858507203e-309)`

`mi < sys.float_info.min`

`True`

The upper and lower bound are not treated the same.
You can have a min value that is smaller than the value returned by float_info.

# Wrap up

How did you do in the quiz?

If you can use an int don't use an float. You will
avoid a whole class of problems.

**Heuristic: Simpler software is better than complex software.**

How did you do? I'm pretty sure most developers
struggle with this quiz. I'm one of them. We work casually with complexity. We
use concepts without thoroughly understanding them. This is true for nearly
everything in software as we're building layer over layer of software.

In general this is a good thing, it's the only way we
know of how to use the hardware we have. On the other hand, we should be
conservative in estimating our software quality. It's probably way worse than
we think. The only way out is to make the implementation as simple as possible.

## No comments:

## Post a Comment