Eccles & Toad home ~ blog ~ perl ~ projects ~ employ me
You are here: home > blog > 2006 > 08

Blogs 2006/08

(RSS)

2006/08/26 - Botley Road, Holiday Inn and Lucy Kellaway

Recently I needed to go to Southampton and stay overnight at a hotel on 'Botley Road'. There are two Botley Roads less than a mile apart and my satnav picked the wrong one - meaning that I wasted over an hour before spotting the error (and got flashed by a speed camera on a dual carriage dual lane road that appears to be 30mph).

Amusingly the wrong Botley Road has been divided by a main road into two halves, so that to get from one half to the other (where the hotel might be) is a drive of several miles.

The hotel room was then too hot, despite the corridor being too cold.

The bed was uncomfortable - I did not sleep well.

I missed breakfast by a couple of minutes and found that the same machine that two minutes ago was serving free coffee now wanted two pounds. The serving staff were unable to help, but suggested that I could make myself a coffee in my room. They couldn't give me any fresh milk though - the kitchen was being 'cleaned'.

In despair I decided to 'collect' myself in my room and my iTunes randomly selected Lucy Kellaway's 'Honking for Harvard' FT.com podcast, which was pure balm.

I now feel that I am not alone in believing that the world has gone mad and have some small faith that it might be alright in the end. That the unthinking idiocy and unquestioning acceptance of most people will be replaced with a bit of common sense.

Fat chance - but here's hoping.

2006/08/25 - My language can do that!

Joel Spolsky had some fun with stuff you can do in JavaScript - I've done something similar with Perl.

Can your language do this?

This is one of the nicer introductions to anonymous subroutines that I've seen, as well as good reasons to use them. This can be done trivially in Perl to great effect.

He doesn't go on to talk about closures though which is a shame as it is one of the few ways in Perl to get truly private variables. Because the variable and sub are created in the same scope they can interact. But when you leave the scope the variable is no accessible, but the sub is.

{
     # In lexical scope so '$private' is only visible here.
     my $private = 'hello';
     sub get_private { return $private; }
     sub set_private { return $private = shift; }
}

# try to access private here causes syntax error
$private = 'bad value';

# but can access it here using sub
my $value = get_private();

A variant on this is the BEGIN and END blocks - which are useful in testing to clean up files that got created:

# start of tests - create a file:
my $file = '/var/test/boing';
ok create_file( $file ), "created file";

... # tests go here

# end of tests - delete the file.
ok unlink( $file ), "deleted file";

This is messy as the two actions ( creating and deleting the file ) are now separated by code even though they should go together. Better to write:

# start of tests - create a file:
my $file = '/var/test/boing';
ok create_file( $file ), "created file";
END { ok unlink( $file ), "deleted file"; }

... # tests go here

There is a gotcha though in that if the value in $file is changed during the run of the tests then the wrong file might get deleted. Use a closure:

# start of tests - create a file:
my $file = '/var/test/boing';
ok create_file( $file ), "created file";

{
     my $copy = $file;
     END { ok unlink( $copy ), "deleted file"; }
}
... # tests go here

It is the block of code that you give to END that is executed, and $copy is in it even though it will have gone out of scope by the time that the END runs.

Still this is a bit murky - to get it really clear something like this might be used:

{   # abstract the cleaning up of the files.
     my @files_to_delete = ();
     sub delete_file_at_end { push @files_to_delete, shift; }
     END { ok( unlink($_), "deleted '$_'" ) for @files_to_delete; }
}

# add a file to be deleted at the end
delete_file_at_end( $file );

There is now no way to prevent the end block running and deleting the files - it will even run if the code crashes. This is good as it prevents code from unexpectedly modifying the @file_to_be_deleted (it can't) and means that once you've added a file it will get deleted - you can just forget about it.

Scoping variables is a very powerful tool which I recommend highly.

2006/08/25 - DNS Entry that points to localhost

One of the difficulties that having sub-domains is that you need to have lots of DNS entries for them. This is particularly annoying if you want to run the site on your development machine - ie 127.0.0.1. That is why I just registered 127-0-0-1.org.uk.

If you do a DNS lookup on '127-0-0-1.org.uk' you will find that its IP address is 127.0.0.1. The DNS is wildcarded so anything.127-0-0-1.org.uk will always return 127.0.0.1. Even sub-sub-sub-domains work.

I tried to find a domain that did this - perhaps I was asking Google the wrong question. I am now a few pounds poorer but hopefully it will be useful to others. Feel free to use it if you need to.

If only it were easier to create wild card entries in /etc/hosts - or whatever voodoo Mac OS X is using this release. I still can't work easily with sub-domains with out a network connection. I know I could install a DNS server on my laptop - but should I really need to?

I currently have a fair amount of hate for sub-domains - watch this space.

2006/08/20 - Testing binary searches.

In 1986 the excellent book 'Programming Pearls' was first published. Amongst other things it looks at binary searching, shows how tricky it can be to get right and presents some code that is correct. Unfortunately the code has a bug in it.

A binary search is a quick way of finding if something is in an sorted list of values, and if so where it is. Take a list of first names:

[ adam, bob, carol, dean, eve, freddy, george ]

If you wanted to see if the name 'freddy' was in the list you could just go along the list and check each name. For a short list this is fine, but for a long list it takes too long. As the list is sorted (i.e. in alphabetical order) you can search it more quickly by starting in the middle of the list ('dean'), checking if it is higher or lower than what you are looking for. 'dean' is lower than 'freddy' so you ignore the lower part of the list and go to the middle of the remaining list and repeat.

This bug was pointed out by Joshua Bloch on the Official Google Research Blog in this post (which you should read now for the rest of this to make sense).

Once you've seen this bug it is pretty obvious, but how could you test for it? As it only appears for lists with 230 or so elements it is tricky to test. It is not really feasible to create a list that size on most hardware - it would need a fair bit of memory and take a long time.

But good testing practice says that you should create a failing test case for it before fixing the code, so that the fix is confirmed. So how can we create a one billion element array without actually creating it?

With Perl you can do this - using tie. This allows you to create a variable that appears as an array (or hash, or scalar etc) but is in fact an object. Whenever you operate on it you instead call methods on the object. This is all transparent to the code.

For our tests we want to create an array where each element has a value that is the same as the index:

@array = ( 0, 1, 2, 3, 4, ..., $max );

This array is easy to test as we can search for a value and the index it is found at is the same as the element value. We also know how big it is ($max + 1).

What we need to do is create something that can simulate this behavior without actually being an array. We need to be able to report the size of the array and return the correct value for each index requested. The following minimal code achieves this:

package TestArray;

sub TIEARRAY {
    my ( $class, $max ) = @_;
    return bless \$max, $class;
}

sub FETCHSIZE {
    my $self = shift;
    return $$self + 1;
}

sub FETCH {
    my ( $self, $index ) = @_;
    return $index;
}

1;

You can then use this array just as you would a normal one, including creating a reference to it:

tie my @array, 'TestArray', 1000;
print scalar @array;         # will print '1001'
print $array[234];           # will print '234'

my $array_ref = \@array;
print scalar @$array_ref;    # will print '1001'
print $array_ref->[234];     # will print '234'

The TIEARRAY sub returns a blessed reference to a scalar. The value of the scalar is the maximum value in the array. This is used in FETCHSIZE sub which returns the number of elements in the array. The last sub FETCH simply returns the value passed to it, which is the index in the array. There is no error checking, I've left this out for simplicity.

This 'test array' can then be used in the test scripts to test that the old code breaks and that the new code works. This test script does all this. Perl would not normally exhibit the integer roll over, but by adding use integer; at the start it does. Also note that we need to check for the negative number explicitly. This is because a negative value is valid as an array index in Perl, causing the search to get stuck in and endless loop.

It is also interesting to see that checking the extremes would also have caught this bug. This is where you test all the limits that you can think of, one of which would have been lists at the upper end of the legal range.