Eccles & Toad home ~ blog ~ perl ~ projects ~ employ me
You are here: home > blog > 2006 > 08 > 20

Blogs 2006/08/20

(RSS)

2006/08/20 - Testing binary searches.

In 1986 the excellent book 'Programming Pearls' was first published. Amongst other things it looks at binary searching, shows how tricky it can be to get right and presents some code that is correct. Unfortunately the code has a bug in it.

A binary search is a quick way of finding if something is in an sorted list of values, and if so where it is. Take a list of first names:

[ adam, bob, carol, dean, eve, freddy, george ]

If you wanted to see if the name 'freddy' was in the list you could just go along the list and check each name. For a short list this is fine, but for a long list it takes too long. As the list is sorted (i.e. in alphabetical order) you can search it more quickly by starting in the middle of the list ('dean'), checking if it is higher or lower than what you are looking for. 'dean' is lower than 'freddy' so you ignore the lower part of the list and go to the middle of the remaining list and repeat.

This bug was pointed out by Joshua Bloch on the Official Google Research Blog in this post (which you should read now for the rest of this to make sense).

Once you've seen this bug it is pretty obvious, but how could you test for it? As it only appears for lists with 230 or so elements it is tricky to test. It is not really feasible to create a list that size on most hardware - it would need a fair bit of memory and take a long time.

But good testing practice says that you should create a failing test case for it before fixing the code, so that the fix is confirmed. So how can we create a one billion element array without actually creating it?

With Perl you can do this - using tie. This allows you to create a variable that appears as an array (or hash, or scalar etc) but is in fact an object. Whenever you operate on it you instead call methods on the object. This is all transparent to the code.

For our tests we want to create an array where each element has a value that is the same as the index:

@array = ( 0, 1, 2, 3, 4, ..., $max );

This array is easy to test as we can search for a value and the index it is found at is the same as the element value. We also know how big it is ($max + 1).

What we need to do is create something that can simulate this behavior without actually being an array. We need to be able to report the size of the array and return the correct value for each index requested. The following minimal code achieves this:

package TestArray;

sub TIEARRAY {
    my ( $class, $max ) = @_;
    return bless \$max, $class;
}

sub FETCHSIZE {
    my $self = shift;
    return $$self + 1;
}

sub FETCH {
    my ( $self, $index ) = @_;
    return $index;
}

1;

You can then use this array just as you would a normal one, including creating a reference to it:

tie my @array, 'TestArray', 1000;
print scalar @array;         # will print '1001'
print $array[234];           # will print '234'

my $array_ref = \@array;
print scalar @$array_ref;    # will print '1001'
print $array_ref->[234];     # will print '234'

The TIEARRAY sub returns a blessed reference to a scalar. The value of the scalar is the maximum value in the array. This is used in FETCHSIZE sub which returns the number of elements in the array. The last sub FETCH simply returns the value passed to it, which is the index in the array. There is no error checking, I've left this out for simplicity.

This 'test array' can then be used in the test scripts to test that the old code breaks and that the new code works. This test script does all this. Perl would not normally exhibit the integer roll over, but by adding use integer; at the start it does. Also note that we need to check for the negative number explicitly. This is because a negative value is valid as an array index in Perl, causing the search to get stuck in and endless loop.

It is also interesting to see that checking the extremes would also have caught this bug. This is where you test all the limits that you can think of, one of which would have been lists at the upper end of the legal range.