PPC Advertising costs

Using Pay Per Click ads is becoming less and less financially viable. This is mainly because they are an auction so if you win you’ll always be paying more that others thought the traffic was worth.

Also they are very expensive – $0.05 might sound cheap per click but unless you have a good conversion rate this quickly becomes much more per sale.

I’ve created a spreadsheet that lets you see how much per sale it’ll cost for a given CPC and conversion rate:

YourNextMP and India trip discussion on YAPC Radio

For the UK general election I ran a project to list all of the candidates and gather their contact details: YourNextMP. It was a success and as a result I was invited to join the Prime Minister on the recent India Delegation.

I’ve a large number of notes and thoughts that I’ll be publishing here in due course – but for now there is an interview with me about it available to download:

YAPC::EU::2010 Radio – Episode 3

The interview with me starts about 5 minutes in. There are many Perl conferences and workshops held every year, YAPC::EU is the largest and this year was in Pisa – just days after the India trip.

Writing a Facebook privacy policy

As part of creating a Facebook app you have to come up with a Terms of Service and a Privacy Policy. Facebook insist – although a fair few apps don’t seem to have them.

The privacy policy says what data you collect and how you store and use it. The terms of service is who gets the blame if something goes wrong.

To get inspiration I took a closer look at some policies that are out there on the net. Here are some of the more interesting ones I found:

Zynga (makers of FarmVille) aren’t closing any doors on future data sources:

We may use information about you that we collect from other sources, including but not limited to newspapers and Internet sources such as blogs, instant messaging services, Zynga games and other users of Zynga, to supplement your profile.

Christopher Penn’s blog takes a less formal approach but it is at least crystal clear:

… Don’t ever submit to me anything you wouldn’t put up on a public bulletin board …

And finally JavaRanch has one that is short and sweet:

Be nice.

I suppose I’ll tread the middle ground  – trying to avoid the pointless legalese but also trying to be a little more specific.

Send-a-Test Launched

I’ve just launched http://www.send-a-test.com/. This is a site that should make sending out pre-interview tests much easier. It takes care of getting the test to the candidate, timing them whilst they do it and then storing their answers when they are done.

As with most things I did this to scratch an itch.

Recently I was being recruited – which means that your CV goes onto lots of websites and you get lots of calls. Often once the recruiter has checked that you are right for the job they’ll want to send you a test to check that you know what you’re doing.

This is where the pain would start. Their problem is that they want to time how long it takes for you to do the test, understandably. To do this they email it to you, you start work straight away, do the test and email back the answers. Compare the times on the two emails and you have how long it took to do the test.

This is great, as long as you can start your test during the recruiter’s office hours, which if you already have a job you almost certainly can’t. It also wastes time with all the arranging of times, and follow up calls to check that you’ve received the test by email etc.

The solution was simple – create a website that the recruiter saves the test on. When they want someone to do the test they tell the site and it sends an email to the candidate with a link in. The candidate follows the link and gets all the instructions they need for the test, and when they are ready they download the test file to start. When done they upload the answers and the timer stops.

And that is exactly what http://www.send-a-test.com/ does – enjoy!

Know your market

Right – I’m sitting on the bus in traffic and to my right is a father and his three young children. As a bus in traffic is deadly dull they are playing a game where one person says a list of words and then someone else makes up a story that has all the words in it.

So the father says: “canoe, tea cup, fire extinguisher, astronaut and the biggest thing in the world”.

Before the little boy has even started talking I came up with: “There is an astronaut orbiting the biggest thing in the world who wants to relax so he makes some tea with water from a fire extinguisher, pours it into his favorite tea cup and day dreams about his canoe”.

Feeling quite smug that I had completed the task quickly and accurately I was curious to see if the boy would do as well. Needless to say he thrashed me.

His story started off promisingly with an astronaut who was hungry. He decided to eat the biggest thing in the world. Having eaten this he then went to drink some juice, but couldn’t find a glass and so used his tea cup instead. At the end it turns out that all of this happens in his canoe. He forgot about the fire extinguisher, but fixed that by being attacked by a fire breathing dragon.

The whole story was delivered with hand waving, facial expressions, giggles and lots of heckling from the others.

The reason he thrashed me was that they went on playing the game, whereas my contribution would have been so breathtakingly boring that it would have been the last round and children would have sat in silence contemplating the misery that is the daily commute.

The aim of the game was not to use all the words in as few words as possible. It was to keep the others entertained. He did that.

Build code like you would pack a garage

At my current job I am frustrated at how the code is being built. In order to explain it I came up with this analogy about packing a garage:

Say you have a big pile of stuff that you need to store. You rent a garage and start to fill it, putting things on the floor and stacking other things on top where they fit best. Eventually you have filled the garage with all the stuff and you close the door. Everything just fits so there is no wasted space. Job done.

The other way to do it would be to rent slightly bigger garage and install some proper storage shelves. You’d also get some boxes and marker pens and go through all the stuff and sort it into the right boxes: CD’s in one, winter clothes in another. You put similar things near each other – like all the suitcases could be one one shelf. After a while everything is in and you close the door. Job done.

So one method is faster and cheaper, and both achieve the same objective which is to get stuff into the garage. Obviously the ‘stuff it in’ method is better.

Truth be told the ‘stuff it in’ method is better for filling the garage, and if that is all you want to do then fine. However if you ever want to get to anything in the garage you will realise that it was a mistake.

You have several problems to deal with. Firstly you can’t find anything because there is no order and you have forgotten where you put things. Related things are not stored together so finding the ski boots does not get you closer to the skis. Also once you find something you can’t get it out with out disturbing everything else. Finally because everything is so packed in you can’t see it all so you can’t be sure that there isn’t a box at the back getting crushed.

Needless to say this is not the case with a properly stacked garage. Everything is on a shelf so it is easy to get to. Related things are collected in well labeled boxes so they are easy to find. When you find the necessary box you can pull it out without worrying about anything else falling down.

Bringing this back to code you can probably see the parallels. The ‘stuff’ that needs to go in the garage are the features of your code. Poorly written code is not well organized. All the functional code was just packed in as and when it was needed so it is not possible to easily find anything. Related bits of code end up in different files. There is lots of ‘action at a distance’ and plenty of side effects. Fixing or changing one bit of code can cause another to fail, seemingly for no reason, and this failure way not be obvious for some time.

Another parallel is that a tidy garage stays tidy because anything out of place looks messy and wrong. However once there is a little mess more mess quickly builds up and you start to lose the benefits of a tidy garage.

Code goes downhill in the same way. Regardless of how hard you try to keep it tidy if there is someone on your team who litters then it starts to bring everything down. Soon you stop being a coder and start being a cleaner. Or you leave it too long and suddenly you realise that it is all a big mess, that the best way to fix it is to take everything out, sort through it and put it back in again. But if a messy garage is not seen as a problem then the cleanup will not last, and you are wasting your time.

If you find yourself in this position I hope this analogy helps you.

Business and Technical Clash

There is an impedance mismatch in most software based companies. The business side (the ‘suits’) want to build up financial capital, getting sales and putting more money in the bank. The technical side (the ‘coders’) want to build up technical capital, creating code that is elegant and easy to maintain.

There are several interesting parallels that can be drawn regarding both sides. For example a cash flow crisis is pretty much equivalent to code becoming unmaintainable. Likewise a code base that is easily extendable and flexible is equivalent to good financial growth and opening up potential new markets.

The mismatch is that both sides do not really pull in the same direction. Rarely do the short term objectives of the two sides mesh.

After any period of development it is necessary to go back through the code and correct the mistakes that were made during development. As far as the suits are concerned this could be seen as ‘technical masturbation’ – after all it is not changing the product and so is not generating new revenue. The coders though realize that if this cleanup is not carried out sooner rather than later the mistakes will get baked in to the code and will become increasingly more difficult to extract, eventually resulting in the code becoming unmaintainable and requiring a rewrite.

In order to make the business work the suits need to be able to offer the customers the features that they want. Often the most interesting technical problems are not the most profitable ones and so there can be reluctance from the coders to work on these features. This is especially the case if the coders are not in contact with the end users and so do not have the needed perspective. The suits realize that unless the product does what the customers want then there is no money to be made.

There are four possible scenarios that can develop, three of which will lead to the company failing. These are:

  • bad business, bad code – this is fairly obviously going to fail.
  • good business, bad code – if the code is not up to scratch it will cause failure, even if the business is good. This is because the code is the core of the business, and others will come along and compete. If the code cannot match the competition then the business will go away.
  • bad business, good code – regardless of how good the code is if no-one is willing to pay for it the company will fail.
  • good business, good code – the recipe for success. There is money coming in from the good business and the code is able to adapt to the competition when it presents itself.

The problem is that good business and good code take time. If either is favored over the other then in the long run the company will fail. There can be a real pressure to let sleeping dog lie from both sides, which just pushes the problems into the future where they will be harder to deal with.

Essentially both sides of the business need to listen to each other and understand where they are going. Conflicts will arise, it is only natural. The trick is in knowing when to back down and let the other side have its way, and when to stand firm. Bear in mind that if either side loses then the company will fail.

DNS Entry that points to localhost

UPDATE: 127-0-0-1.org.uk has now lapsed. *.vcap.me does the same thing and is likely to be stable, so please use that :)

One of the difficulties that having sub-domains is that you need to have lots of DNS entries for them. This is particularly annoying if you want to run the site on your development machine – ie 127.0.0.1. That is why I just registered 127-0-0-1.org.uk.

If you do a DNS lookup on ’127-0-0-1.org.uk’ you will find that its IP address is 127.0.0.1. The DNS is wildcarded so anything.127-0-0-1.org.uk will always return 127.0.0.1. Even sub-sub-sub-domains work.

I tried to find a domain that did this – perhaps I was asking Google the wrong question. I am now a few pounds poorer but hopefully it will be useful to others. Feel free to use it if you need to.

If only it were easier to create wild card entries in /etc/hosts – or whatever voodoo Mac OS X is using this release. I still can’t work easily with sub-domains with out a network connection. I know I could install a DNS server on my laptop – but should I really need to?

I currently have a fair amount of hate for sub-domains – watch this space.

My language can do that!

Joel Spolsky had some fun with stuff you can do in JavaScript – I’ve done something similar with Perl.

Can your language do this?

This is one of the nicer introductions to anonymous subroutines that I’ve seen, as well as good reasons to use them. This can be done trivially in Perl to great effect.

He doesn’t go on to talk about closures though which is a shame as it is one of the few ways in Perl to get truly private variables. Because the variable and sub are created in the same scope they can interact. But when you leave the scope the variable is no accessible, but the sub is.

{
     # In lexical scope so '$private' is only visible here.
     my $private = 'hello';
     sub get_private { return $private; }
     sub set_private { return $private = shift; }
}

# try to access private here causes syntax error
$private = 'bad value';

# but can access it here using sub
my $value = get_private();

A variant on this is the BEGIN and END blocks – which are useful in testing to clean up files that got created:

# start of tests - create a file:
my $file = '/var/test/boing';
ok create_file( $file ), "created file";

... # tests go here

# end of tests - delete the file.
ok unlink( $file ), "deleted file";

This is messy as the two actions ( creating and deleting the file ) are now separated by code even though they should go together. Better to write:

# start of tests - create a file:
my $file = '/var/test/boing';
ok create_file( $file ), "created file";
END { ok unlink( $file ), "deleted file"; }

... # tests go here

There is a gotcha though in that if the value in $file is changed during the run of the tests then the wrong file might get deleted. Use a closure:

# start of tests - create a file:
my $file = '/var/test/boing';
ok create_file( $file ), "created file";

{
     my $copy = $file;
     END { ok unlink( $copy ), "deleted file"; }
}
... # tests go here

It is the block of code that you give to END that is executed, and $copy is in it even though it will have gone out of scope by the time that the END runs.

Still this is a bit murky – to get it really clear something like this might be used:

{   # abstract the cleaning up of the files.
     my @files_to_delete = ();
     sub delete_file_at_end { push @files_to_delete, shift; }
     END { ok( unlink($_), "deleted '$_'" ) for @files_to_delete; }
}

# add a file to be deleted at the end
delete_file_at_end( $file );

There is now no way to prevent the end block running and deleting the files – it will even run if the code crashes. This is good as it prevents code from unexpectedly modifying the @file_to_be_deleted (it can’t) and means that once you’ve added a file it will get deleted – you can just forget about it.

Scoping variables is a very powerful tool which I recommend highly.

Testing binary searches

In 1986 the excellent book ‘Programming Pearls’ was first published. Amongst other things it looks at binary searching, shows how tricky it can be to get right and presents some code that is correct. Unfortunately the code has a bug in it.

A binary search is a quick way of finding if something is in an sorted list of values, and if so where it is. Take a list of first names:

[ adam, bob, carol, dean, eve, freddy, george ]

If you wanted to see if the name ‘freddy’ was in the list you could just go along the list and check each name. For a short list this is fine, but for a long list it takes too long. As the list is sorted (i.e. in alphabetical order) you can search it more quickly by starting in the middle of the list (‘dean’), checking if it is higher or lower than what you are looking for. ‘dean’ is lower than ‘freddy’ so you ignore the lower part of the list and go to the middle of the remaining list and repeat.

This bug was pointed out by Joshua Bloch on the Official Google Research Blog in this post (which you should read now for the rest of this to make sense).

Once you’ve seen this bug it is pretty obvious, but how could you test for it? As it only appears for lists with 230 or so elements it is tricky to test. It is not really feasible to create a list that size on most hardware – it would need a fair bit of memory and take a long time.

But good testing practice says that you should create a failing test case for it before fixing the code, so that the fix is confirmed. So how can we create a one billion element array without actually creating it?

With Perl you can do this – using tie. This allows you to create a variable that appears as an array (or hash, or scalar etc) but is in fact an object. Whenever you operate on it you instead call methods on the object. This is all transparent to the code.

For our tests we want to create an array where each element has a value that is the same as the index:

@array = ( 0, 1, 2, 3, 4, ..., $max );

This array is easy to test as we can search for a value and the index it is found at is the same as the element value. We also know how big it is ($max + 1).

What we need to do is create something that can simulate this behavior without actually being an array. We need to be able to report the size of the array and return the correct value for each index requested. The following minimal code achieves this:

package TestArray;

sub TIEARRAY {
    my ( $class, $max ) = @_;
    return bless \$max, $class;
}

sub FETCHSIZE {
    my $self = shift;
    return $$self + 1;
}

sub FETCH {
    my ( $self, $index ) = @_;
    return $index;
}

1;

You can then use this array just as you would a normal one, including creating a reference to it:

tie my @array, 'TestArray', 1000;
print scalar @array;         # will print '1001'
print $array[234];           # will print '234'

my $array_ref = \@array;
print scalar @$array_ref;    # will print '1001'
print $array_ref->[234];     # will print '234'

The TIEARRAY sub returns a blessed reference to a scalar. The value of the scalar is the maximum value in the array. This is used in FETCHSIZE sub which returns the number of elements in the array. The last sub FETCH simply returns the value passed to it, which is the index in the array. There is no error checking, I’ve left this out for simplicity.

This ‘test array’ can then be used in the test scripts to test that the old code breaks and that the new code works. This test script does all this. Perl would not normally exhibit the integer roll over, but by adding use integer; at the start it does. Also note that we need to check for the negative number explicitly. This is because a negative value is valid as an array index in Perl, causing the search to get stuck in and endless loop.

It is also interesting to see that checking the extremes would also have caught this bug. This is where you test all the limits that you can think of, one of which would have been lists at the upper end of the legal range.