Testing binary searches

In 1986 the excellent book ‘Programming Pearls’ was first published. Amongst other things it looks at binary searching, shows how tricky it can be to get right and presents some code that is correct. Unfortunately the code has a bug in it.

A binary search is a quick way of finding if something is in an sorted list of values, and if so where it is. Take a list of first names:

[ adam, bob, carol, dean, eve, freddy, george ]

If you wanted to see if the name ‘freddy’ was in the list you could just go along the list and check each name. For a short list this is fine, but for a long list it takes too long. As the list is sorted (i.e. in alphabetical order) you can search it more quickly by starting in the middle of the list (‘dean’), checking if it is higher or lower than what you are looking for. ‘dean’ is lower than ‘freddy’ so you ignore the lower part of the list and go to the middle of the remaining list and repeat.

This bug was pointed out by Joshua Bloch on the Official Google Research Blog in this post (which you should read now for the rest of this to make sense).

Once you’ve seen this bug it is pretty obvious, but how could you test for it? As it only appears for lists with 230 or so elements it is tricky to test. It is not really feasible to create a list that size on most hardware – it would need a fair bit of memory and take a long time.

But good testing practice says that you should create a failing test case for it before fixing the code, so that the fix is confirmed. So how can we create a one billion element array without actually creating it?

With Perl you can do this – using tie. This allows you to create a variable that appears as an array (or hash, or scalar etc) but is in fact an object. Whenever you operate on it you instead call methods on the object. This is all transparent to the code.

For our tests we want to create an array where each element has a value that is the same as the index:

@array = ( 0, 1, 2, 3, 4, ..., $max );

This array is easy to test as we can search for a value and the index it is found at is the same as the element value. We also know how big it is ($max + 1).

What we need to do is create something that can simulate this behavior without actually being an array. We need to be able to report the size of the array and return the correct value for each index requested. The following minimal code achieves this:

package TestArray;

sub TIEARRAY {
    my ( $class, $max ) = @_;
    return bless \$max, $class;
}

sub FETCHSIZE {
    my $self = shift;
    return $$self + 1;
}

sub FETCH {
    my ( $self, $index ) = @_;
    return $index;
}

1;

You can then use this array just as you would a normal one, including creating a reference to it:

tie my @array, 'TestArray', 1000;
print scalar @array;         # will print '1001'
print $array[234];           # will print '234'

my $array_ref = \@array;
print scalar @$array_ref;    # will print '1001'
print $array_ref->[234];     # will print '234'

The TIEARRAY sub returns a blessed reference to a scalar. The value of the scalar is the maximum value in the array. This is used in FETCHSIZE sub which returns the number of elements in the array. The last sub FETCH simply returns the value passed to it, which is the index in the array. There is no error checking, I’ve left this out for simplicity.

This ‘test array’ can then be used in the test scripts to test that the old code breaks and that the new code works. This test script does all this. Perl would not normally exhibit the integer roll over, but by adding use integer; at the start it does. Also note that we need to check for the negative number explicitly. This is because a negative value is valid as an array index in Perl, causing the search to get stuck in and endless loop.

It is also interesting to see that checking the extremes would also have caught this bug. This is where you test all the limits that you can think of, one of which would have been lists at the upper end of the legal range.

Posted in Uncategorized | Leave a comment

Botley Road, Holiday Inn and Lucy Kellaway

Recently I needed to go to Southampton and stay overnight at a hotel on ‘Botley Road’. There are two Botley Roads less than a mile apart and my satnav picked the wrong one – meaning that I wasted over an hour before spotting the error (and got flashed by a speed camera on a dual carriage dual lane road that appears to be 30mph).

Amusingly the wrong Botley Road has been divided by a main road into two halves, so that to get from one half to the other (where the hotel might be) is a drive of several miles.

The hotel room was then too hot, despite the corridor being too cold.

The bed was uncomfortable – I did not sleep well.

I missed breakfast by a couple of minutes and found that the same machine that two minutes ago was serving free coffee now wanted two pounds. The serving staff were unable to help, but suggested that I could make myself a coffee in my room. They couldn’t give me any fresh milk though – the kitchen was being ‘cleaned’.

In despair I decided to ‘collect’ myself in my room and my iTunes randomly selected Lucy Kellaway’s ‘Honking for HarvardFT.com podcast, which was pure balm.

I now feel that I am not alone in believing that the world has gone mad and have some small faith that it might be alright in the end. That the unthinking idiocy and unquestioning acceptance of most people will be replaced with a bit of common sense.

Fat chance – but here’s hoping.

Posted in Uncategorized | Leave a comment

A roundup of sudoku related stuff

A while back I wrote a little sudoku solver that I was quite pleased with. To my surprise it has actually received quite a bit of attention.

First off there is Mark Byers, who started by shaving my solver down a bit to make it even shorter. Then he wrote a solver in python that is even shorter still.

Interestingly almost all the traffic to my solver comes from one page.

And the interest has been fairly international, this Korean blog entry means very little to me, and even less after Google has tried to translate it.

Finally I came across the following posting. To answer the question there, I leave the sub by calling ‘die’ because it is quick and short. Irritatingly I need to pay for an account to see my own code being discussed. Ho Hum.

The irony of this all is that I have yet to actually solve a sudoku by myself, somehow actually sitting down and plodding on isn’t that appealing. Especially when there are much more interesting things to work on, like converting numbers into Roman numerals using regular expressions…

Posted in Uncategorized | Leave a comment

It is amazing what you can accomplish…

..if you don’t care who gets the credit (Harry S Truman). In this spirit I am making a few projects open-source for all to look at and hopefully contribute to. These include scrpbk.com (a book marking site), prlmnks.org (RSS feeds to perlmonks.org), some forums software written in Perl and Tolk – a translation system.

Here are links to the projects (and sites if appropriate):

For me easily the most interesting is Tolk – which is where I will be spending what little time that I have. The idea is simply to make translating things easier, without letting anything get in the way. See the dev site for more details.

Much praise must go to the people behind trac and subversion though. If these two projects had not made it so easy to put projects on the web (source code, bugs, wiki, planning, etc) it would probably not have happened.

Posted in Uncategorized | Leave a comment

Obfuscation as an earning tool

Several months ago I posted a entry which solved sudokus in just a few lines. As a result I got a job and so have been somewhat distracted of late. The interesting thing is that the obfu (I believe) helped in the interview process, which was unexpected – after all it is not ‘production’ code, whatever that is.

Ho hum. I now work for 3B.net on a variety of projects.

Posted in Uncategorized | Leave a comment

Sudoku solver in three lines explained

In previous posts I produced a four line sudoku solver and then went on to explain it. After having watched a C programmer’s approach to it I am down to under three lines. Here it is and how it works.

use integer;@A=split//,<>;sub R{for$i(0..80){next if$A[$i];my%t=map{$_/9
==$i/9||$_%9==$i%9||$_/27==$i/27&&$_%9/3==$i%9/3?$A[$_]:0=>1}0..80;R($A[
$i]=$_)for grep{!$t{$_}}1..9;return$A[$i]=0}die@A}R

Above the obfu version, below the same code having been fed through perltidy. The code works as before – you feed in the sudoku grid on one line using zeros for the blanks. The code then spits out the solution in the same form.

use integer;
@A = split //, <>;

sub R {
    for $i ( 0 .. 80 ) {
        next if $A[$i];
        my %t = map {
                 $_ / 9 == $i / 9
              || $_ % 9 == $i % 9
              || $_ / 27 == $i / 27 && $_ % 9 / 3 == $i % 9 / 3
              ? $A[$_]
              : 0 => 1
        } 0 .. 80;
        R( $A[$i] = $_ ) for grep { !$t{$_} } 1 .. 9;
        return $A[$i] = 0;
    }
    die @A;
}
R

The core difference is that in my four line version I stored the sudoku grid on a hash, whereas here I use an array. For convenience here is a sudoku grid where the numbers represent the index in the array:

 0  1  2    3  4  5    6  7  8
 9 10 11   12 13 14   15 16 17
18 19 20   21 22 23   24 25 26

27 28 29   30 31 32   33 34 35
36 37 38   39 40 41   42 43 44
45 46 47   48 49 50   51 52 53

54 55 56   57 58 59   60 61 62
63 64 65   66 67 68   69 70 71
72 73 74   75 76 77   78 79 80

I’ll now run through the code explaining the various interesting bits.

use integer;
@A = split //, <>;

It is rare that obfus use modules but in this case it saved a few characters. Later on in the code I do a fair amount of integer mathematics, and without the use integer I would have to wrap up six of the expressions like so: int( $_ / 9 ).

The second line puts the grid from STDIN onto an array @A.

sub R {
    for $i ( 0 .. 80 ) {
        next if $A[$i];

The subroutine R is so called because it recurses. The main part of it is a for loop that runs over the whole grid. However if there is a value in the grid it skips on to the next position. $i is now the current position in the grid that we are looking to find a value for.

        my %t = map {
                 $_ / 9 == $i / 9
              || $_ % 9 == $i % 9
              || $_ / 27 == $i / 27 && $_ % 9 / 3 == $i % 9 / 3
              ? $A[$_]
              : 0 => 1
        } 0 .. 80;

This is a bit hairy. I am creating a hash %t that will have as its keys the numbers that can not be used in this position, ie the ones that have been taken. The map contains a bool_expr ? if_true : if_false construct to determine what the key should be.

The first three lines in the map are the boolean expressions to determine if $_ is in the same row, column or grid as $i. $_ / 9 == $i / 9 is for the row (remember that the division is integer so something like 12/9 equals 1). || $_ % 9 == $i % 9 is for the column and || $_ / 27 == $i / 27 && $_ % 9 / 3 == $i % 9 / 3 is for the grid (first part for grid row, second part for grid column). These are quite straight forward once you’ve gotten your head round them.

Really the booleans should have brackets round them to clear things up, however because Perl will stop evaluating them as soon as it finds one that is true the final && is ok.

Remember that the map is assigning a key/value pair to the hash. Well if these tests return true then the key is $A[$_], otherwise it is 0. The value is hard coded to true. I was surprised to find that the code for the key did not need to be in brackets, evidently the ?: is more tightly bound than => which is as you would expect.

        R( $A[$i] = $_ ) for grep { !$t{$_} } 1 .. 9;

Three things happen here. The for loop runs for all the numbers that can be used in this position thanks to the grep { !$t{$_} } which filters out the numbers that have been used – they have a true value in the hash %t.

R( $A[$i] = $_ ) sets the current position to one of the possible values and then recurses.

        return $A[$i] = 0;

If none of the values in the for loop lead to a solution, or if there were no values to try then we reset the current position to zero and return. This allows other values to be tried in the ‘higher’ calls to R.

    }
    die @A;
}

If the outermost for loop ever reaches the end (ie $i == 81) then we have our solution. Print it out and stop processing.

R

The only thing left to do is to actually start running the code.

So there we are, sudokus solved in three lines of Perl. In my opinion this is cleaner than my previous offerings as the logic is tighter. Still this obfu is not as short as it can get. There are two superfluous semicolons and the logic in the map could be shortened too, as well as some other bits.

Posted in Uncategorized | Leave a comment

Obfuscation as a learning tool

A couple of days ago I posted an obfuscation that solves sudoku puzzles in four lines of Perl. To compress the code down to this level required quite a few dirty little tricks which would have no place in production code. Here I am going to postmortem the obfu and point out what there is to learn from it.

First the code as an obfu:

$a=8;$G{int(++$a/9).$a%9+1}=$_ for split//,<>;@A=1..9;sub c{int(($_[0]-1
)/3)*3}sub G{for$y(@A){for$x(@A){$p=$t=$G{my$c=$y.$x}&&next;$t.=$G{$_.$x
}.$G{$y.$_}for@A;for$f(1..3){$t.=$G{c($y)+$f.c($x)+$_}for 1..3}G($G{$c}=
$_)&&return for grep$t!~m/$_/,@A;return$G{$c}=0}}die map{$G{$_}}9..99}G

And then tidied up by perltidy:

$a = 8;
$G{ int( ++$a / 9 ) . $a % 9 + 1 } = $_ for split //, <>;
@A = 1 .. 9;

sub c {
    int( ( $_[0] - 1 ) / 3 ) * 3;
}

sub G {
    for $y (@A) {
        for $x (@A) {
            $p = $t = $G{ my $c = $y . $x } && next;
            $t .= $G{ $_ . $x } . $G{ $y . $_ } for @A;
            for $f ( 1 .. 3 ) { $t .= $G{ c($y) + $f . c($x) + $_ } for 1 .. 3 }
            G( $G{$c} = $_ ) && return for grep $t !~ m/$_/, @A;
            return $G{$c} = 0;
        }
    }
    die map { $G{$_} } 9 .. 99;
}
G

Now I will run through the code explaining what each bit does and musing about it.

$a = 8;
$G{ int( ++$a / 9 ) . $a % 9 + 1 } = $_ for split //, <>;

This code takes the input and builds a hash. The hash %G is keyed on two numbers, such as ’23′ or ’39′. This number is the (y,x) coordinate on the sudoku grid, with the origin in the top left. I used (y,x), rather than (x,y), as it allowed me to print the grid out later on more easily.

The creation of the key is quite interesting as we are numbering the values coming in on the list in a sort of base 9. Think about the change in numbers as you go off the first row and onto the second: 17, 18, 19, 21, 22, 23. As you can see it needs to jump from 19 to 21. The y coordinate is done by throwing away the fractional part of a division, the x coordinate is obtained using a modulus. The increasing value to work with is provided by $a which starts out as ’8′ and then is pre-incremented during the creation of the key.

@A = 1 .. 9;

Simplicity itself. @A is just an array of 1 to 9. If you look in the rest of the code there are many times when I needed to run from 1 to 9 and so as a space saver it was worth creating this array.

sub c {
    int( ( $_[0] - 1 ) / 3 ) * 3;
}

This is a helper function that is used later on when finding which numbers are currently in the block that we are looking at. It is passed a single digit from 1 to 9. It returns 0, 3 or 6 for 1..3, 4..6 or 7..9. See later for how the return value is used.

sub G {

Create a sub routine G. This is the meat of the obfu and most action occurs in here. The basic idea is to find the first square on the grid that does not have a value. Then find all the possible numbers that could go in. For each number put it in and recurse. If we reach the end of the grid exit, otherwise try the next value. If there are no more values to try backup to previous squares.

    for $y (@A) {
        for $x (@A) {

The heart of the routine is two loops, both from 1..9. This gives us the coordinates of the square in the grid that we are working on.

            $p = $t = $G{ my $c = $y . $x } && next;

Several things happen in this line. The most important one is to see if there is a number in the square. As all empty squares are set to 0 (false) this can be done with a simple boolean test. If there is a number then $G{'yx'} is true which causes the next to get called. This is taking advantage of Perl lazy evaluation of the terms, ie. if the first test fails (there is no number) then there is no need to evaluate the second test, the next.

In the interests of cramming code in there are a few other things going on. Firstly a variable $c is set up containing the current (y,x) location. This assignment happens inside the curly braces of the hash. This works because the return value of an assignment is the value assigned. This is also used to give initial false values to $p and $t which are used later. The values in $p and $t are always false as if it was true then the loop would have jumped on.

            $t .= $G{ $_ . $x } . $G{ $y . $_ } for @A;

Finally we start to actually determine what numbers could go into this square. The $t string is appended with the numbers that are already taken (t is for taken). This is done for both the current row and the current column in one statement.

            for $f ( 1 .. 3 ) { $t .= $G{ c($y) + $f . c($x) + $_ } for 1 .. 3 }

This double loop appends the numbers that are taken in the current block onto $t. There are two variables floating around, $f and $_ which both run from 1..3. These are added to the return value from the c subroutine discussed earlier which results in the squares in the current grid being selected.

The funny use of two for loops in this way is to save space.

            G( $G{$c} = $_ ) && return for grep $t !~ m/$_/, @A;

There are several things going on here. There is a loop over the possible number that could go into the square: for grep $t !~ m/$_/, @A. As $t contains the numbers that are taken the grep only returns the numbers from 1 to 9 that are not taken, ie possible values.

For each possible number G( $G{$c} = $_ ) && return is executed. This sets the current square to the possible value. It then recurses into G. If the return value is true then it returns. In fact it returns true as unless specifically given a value return will return whatever is in $_, which as it happens was just set to the return value from G. This will always be true for the return to get called and so true is always returned.

You might wonder why the current square is assigned its value as an argument to the subroutine G? The answer is that it saves one character doing it this way. Compare G( $G{$c} = $_ ) to $G{$c} = $_, G(). As G does nothing with arguments it is passed this is fine.

            return $G{$c} = 0;
        }
    }

If the code did not return above then it should now. If we reach this stage then every possible number for this square was tried and none of them led to a solution of the sudoku. We want to set the current square back to false and the allow the code to try changing some previous numbers to see if they lead to a solution. Again the return value from an assignment is used as a space saver.

    die map { $G{$_} } 9 .. 99;
}

This is dirty. I needed to shave off a couple of characters to get into four lines and so I did this. If we reach this point then the two for loops for $x and $y have reached the end. What has has happened is that the final square has been assigned a number. The subroutine has recursed and the two for loops have run and next has been called for every square as they all have numbers in them. Hence we end up here. To save those last few characters I used die instead of print. Horrible.

There is no need in the the map to worry about the values in G for keys such as 10 or 20 that have no values. They simply result in undefs being printed which produces no errors as use warnings has not been specified.

G

So here we are at the end of the code. All we need to do now is run G for the first time.

Rather interestingly I have spotted a space saver writing this post. There is a variable $p that was set to false but never actually gets used. It used to be used for possible values that could go into the square but became unneeded due to changes. However I forgot it in there. Bug fixing by code reading in action.

Another space saver is that as the code dies when a solution is found there is no need to return as the code would already have died. This means that another line can be shortened:

            G( $G{$c} = $_ ) && return for grep $t !~ m/$_/, @A; # before
            G( $G{$c} = $_ ) for grep $t !~ m/$_/, @A;           # after

This means that the die can be tidied up a bit and we can even format the output:

    die map { $G{$_} } 9 .. 99;             # before
    die map { $G{$_} || "\n" } 9 .. 100;    # after

The new, better, shorter obfu:

$a=8;$G{int(++$a/9).$a%9+1}=$_ for split//,<>;@A=1..9;sub c{int(($_[0]-1
)/3)*3}sub G{for$y(@A){for$x(@A){$t=$G{my$c=$y.$x}&&next;$t.=$G{$_.$x}.
$G{$y.$_}for@A;for$f(1..3){$t.=$G{c($y)+$f.c($x)+$_}for 1..3}G($G{$c}=$_
)for grep$t!~m/$_/,@A;return$G{$c}=0}}die map{$G{$_}||"\n"}9..100}G

I hope that you will agree that even though there are only four lines of code here there is plenty to think about. Not all of it is ‘good’ code, in fact most of it is very bad code. However the artificial constraint of four lines leads to some interesting techniques that take advantages of some of the more esoteric parts of the language.

Posted in Uncategorized | Leave a comment

Sudoku solver in four lines

Slightly silly perhaps but I thought it would be fun to write a piece of code to solve these sudoku puzzles that keep popping up. As that appeared to be trivial I decided to see if it could be done in four lines or less. Here it is:

$a=8;$G{int(++$a/9).$a%9+1}=$_ for split//,<>;@A=1..9;sub c{int(($_[0]-1
)/3)*3}sub G{for$y(@A){for$x(@A){$p=$t=$G{my$c=$y.$x}&&next;$t.=$G{$_.$x
}.$G{$y.$_}for@A;for$f(1..3){$t.=$G{c($y)+$f.c($x)+$_}for 1..3}G($G{$c}=
$_)&&return for grep$t!~m/$_/,@A;return$G{$c}=0}}die map{$G{$_}}9..99}G

Update – there is also an explanation of this code and a three line version.

It is slightly arcane in it’s usage but what do you expect from four lines of Perl? The best way to run it is to save it to a file (eg sudoku.pl). You need to feed it the start grid as well. This is one long line with the top row first, the second row next and so on. Use zeros for the blanks. Do not separate the rows with a space, just push them all together.

For example the following puzzle was given in the Economist (21 May 2005) on page 75:

. . . . 1 . . . .
3 . 1 4 . . 8 6 .
9 . . 5 . . 2 . .
7 . . 1 6 . . . .
. 2 . 8 . 5 . 1 .   (dots are used for the blanks)
. . . . 9 7 . . 4
. . 3 . . 4 . . 6
. 4 8 . . 6 9 . 7
. . . . 8 . . . .

This would give you the following line for input:

000010000301400860900500200700160000020805010000097004003004006048006907000080000

If you saved the input in a file called input.txt you could then run the solver like this:

perl sudoku.pl < input.txt

The output is the same format as the input – that is to say all the rows on one line.

It will only work for 9×9 grids containing numbers 1 to 9. Hope you enjoy it.

2005/05/11 – scrpbk.com for all your scrapping needs

Just a quick note to let all know that I have just launched a new website scrpbk.com that lets you organize small bits of information and let the world see them.

You take bits of information (scraps) and organize them into groups (holders). You can move all of these around and tweak them as you like. All of this information is then visible to anyone on the web.

There is much more to come soon though, such as private holders, arranging holders over several pages and ‘social things’. If you have any idea please let me know.

People are already starting to sign up so hurry. Once your username is taken it is taken.

Posted in Uncategorized | 1 Comment

Seats are not votes

I have always suspected that elections that rely on the first past the post system are not really fair and that it is the little parties that lose out. However it would appear that not only do the little ones lose out but the big ones win.

Numbers etc from http://news.bbc.co.uk/1/shared/vote2005/html/scoreboard.stm.

As you can see from the graph the three big parties represent the three possible scenarios. Labour have more seats than they ought, the Lib Dems have less and the Conservatives are properly represented.

Down in the low numbers UKIP, the Greens and the BNP should all have a few seats by votes but achieved none.

It is (I think) quite interesting that if we had proportional representation then the political landscape (awful phrase) would be quite different. Labour would not have an overall majority but would be pretty much the same as the Tories. The Lib Dems would still be the third party but would have the control if the other two disagree.

Cynically I would suggest that as politicians rarely get anything right and seem to enjoy a good shout either side of a good meal this would suit them well. With three parties all roughly equal nothing really contentious would get through the house. Not too bad really.

PS apologies for the ugly graph. Excel does not appear to let you export a graph to an image without going via a screenshot.

Posted in Uncategorized | Leave a comment

Zap goes my modem

Although everyone gets warned about it no one really appears to take lightning strike seriously. However it does happen. Saturday night brought us a huge thunder storm that rolled around for about an hour. One strike in particular was very close and it must have been the one that fried my modem – see picture.

What you want to notice is the two resistors in the inset. These are the first things a power surge would meet on the way in and as you can see they have taken the brunt of it. No doubt the electronics are also fried although only partially. The modem still works fine, except that it cannot hear the phone line.

Luckily though lightning never strikes twice…

Posted in Uncategorized | Leave a comment