What does edge case
mean? Let's start with the "case" part: it's some state
your software can be in. Okay, that was easy. Now to the "edge"
part: this is a bit more fuzzy as software generally doesn't come
with edges. I think, it means two things: something at the boundary
of the input type domain and a thing of low probability.
If have a
1-dimensional type that is finite from values [x,z] then you can
expect that there should be more problems on "the edges" so
close to x and z. An example could be [0, max integer].
This makes all
sense. The problem is there aren't many of those in your software,
even 1-dimensional types have more boundaries. The most lowly type
int has a lot of edges to look at [min int, -1, 0, -1, max int]. Once
you convert from int32 to int64 you get the union of the edges of
both types and this is only lowly int.
If you have a
decimal type then edges are harder to define: there are no min/max
values (for most implementations) but you get CPU caching related
"edges". Your number could be spread over multiple pages.
Functionally this is the same behavior but now you could run into
performance problems.
This again was only
for lowly integer types. The next level of complexity is floating
point types: resolution of numbers, min, max, NAN, INF. A whole new
set of problems.
Okay what's the mean
of all this? Typically, I don't know your types well you’re
programming with. You also cannot know the interesting cases.
The probability
meaning is even trickier. In the general case you don't know the
input value distribution so you have to assume that all values are
used. This is the safe and easy assumption. Pragmatically, you can
assume that some values aren't actually used in your context as you
also control the caller. In this case you can redefine the domain of
input values. This allows to lower your software development cost.
What doesn't hold is
that "edge cases" are different and allow anything other
than to answer: “This is either correct or incorrect”. This would
mean we also allow undefined results which are either undefined
correct or undefined incorrect.
What's your caller
doing if the result is undefined and the range of input values is
also undefined that produces those values? Accept occasional
disaster. If you want to avoid you have to avoid undefined behavior.
Okay, what can I do?
Step 0: Is your
software important enough? Correctness won't come for free. Define
how much you want to invest into correctness. If the cost of an
invalid result is lower than a correct implementation stop.
Step 1: Define the
allowed input values. If you do not give an answer for a given input
you cannot make an error.
Step 2a: As a
testing strategy: start with what you know and write explicit test.
If you're lucky you
know your types really well. This is half realistically for integer
and I wouldn't bet that I can write correct code for float, string or
timestamps.
Step 2b: For all
other types try to exhaustively test. Drop the concept of an edge
case. If I don't know the type I'm not able to define the interesting
cases. Look into something like Smallcheck for exhaustively testing
"small" values and Quickcheck and fuzzing for randomized
testing.
Step 3: Never use
the word edge case again.
The positive result
of this humble world view is that you'll learn. As an exercise you
can write a simple function something that works with 2 float values,
2 dates with different time zones or a string. Look at all the
problems once you start testing values you haven't thought about. The
floating point, Unicode and calendar implementations have enough
juice to make your life interesting. If you dear to look.
No comments:
Post a Comment