Eccles & Toad home ~ blog ~ perl ~ projects ~ employ me
You are here: home > blog > 2005 > 05 > 27

Blogs 2005/05/27

(RSS)

2005/05/27 - Obfuscation as a learning tool.

A couple of days ago I posted an obfuscation that solves sudoku puzzles in four lines of Perl. To compress the code down to this level required quite a few dirty little tricks which would have no place in production code. Here I am going to postmortem the obfu and point out what there is to learn from it.

First the code as an obfu:

$a=8;$G{int(++$a/9).$a%9+1}=$_ for split//,<>;@A=1..9;sub c{int(($_[0]-1
)/3)*3}sub G{for$y(@A){for$x(@A){$p=$t=$G{my$c=$y.$x}&&next;$t.=$G{$_.$x
}.$G{$y.$_}for@A;for$f(1..3){$t.=$G{c($y)+$f.c($x)+$_}for 1..3}G($G{$c}=
$_)&&return for grep$t!~m/$_/,@A;return$G{$c}=0}}die map{$G{$_}}9..99}G

And then tidied up by perltidy:

$a = 8;
$G{ int( ++$a / 9 ) . $a % 9 + 1 } = $_ for split //, <>;
@A = 1 .. 9;

sub c {
    int( ( $_[0] - 1 ) / 3 ) * 3;
}

sub G {
    for $y (@A) {
        for $x (@A) {
            $p = $t = $G{ my $c = $y . $x } && next;
            $t .= $G{ $_ . $x } . $G{ $y . $_ } for @A;
            for $f ( 1 .. 3 ) { $t .= $G{ c($y) + $f . c($x) + $_ } for 1 .. 3 }
            G( $G{$c} = $_ ) && return for grep $t !~ m/$_/, @A;
            return $G{$c} = 0;
        }
    }
    die map { $G{$_} } 9 .. 99;
}
G

Now I will run through the code explaining what each bit does and musing about it.

$a = 8;
$G{ int( ++$a / 9 ) . $a % 9 + 1 } = $_ for split //, <>;

This code takes the input and builds a hash. The hash %G is keyed on two numbers, such as '23' or '39'. This number is the (y,x) coordinate on the sudoku grid, with the origin in the top left. I used (y,x), rather than (x,y), as it allowed me to print the grid out later on more easily.

The creation of the key is quite interesting as we are numbering the values coming in on the list in a sort of base 9. Think about the change in numbers as you go off the first row and onto the second: 17, 18, 19, 21, 22, 23. As you can see it needs to jump from 19 to 21. The y coordinate is done by throwing away the fractional part of a division, the x coordinate is obtained using a modulus. The increasing value to work with is provided by $a which starts out as '8' and then is pre-incremented during the creation of the key.

@A = 1 .. 9;

Simplicity itself. @A is just an array of 1 to 9. If you look in the rest of the code there are many times when I needed to run from 1 to 9 and so as a space saver it was worth creating this array.

sub c {
    int( ( $_[0] - 1 ) / 3 ) * 3;
}

This is a helper function that is used later on when finding which numbers are currently in the block that we are looking at. It is passed a single digit from 1 to 9. It returns 0, 3 or 6 for 1..3, 4..6 or 7..9. See later for how the return value is used.

sub G {

Create a sub routine G. This is the meat of the obfu and most action occurs in here. The basic idea is to find the first square on the grid that does not have a value. Then find all the possible numbers that could go in. For each number put it in and recurse. If we reach the end of the grid exit, otherwise try the next value. If there are no more values to try backup to previous squares.

    for $y (@A) {
        for $x (@A) {

The heart of the routine is two loops, both from 1..9. This gives us the coordinates of the square in the grid that we are working on.

            $p = $t = $G{ my $c = $y . $x } && next;

Several things happen in this line. The most important one is to see if there is a number in the square. As all empty squares are set to 0 (false) this can be done with a simple boolean test. If there is a number then $G{'yx'} is true which causes the next to get called. This is taking advantage of Perl lazy evaluation of the terms, ie. if the first test fails (there is no number) then there is no need to evaluate the second test, the next.

In the interests of cramming code in there are a few other things going on. Firstly a variable $c is set up containing the current (y,x) location. This assignment happens inside the curly braces of the hash. This works because the return value of an assignment is the value assigned. This is also used to give initial false values to $p and $t which are used later. The values in $p and $t are always false as if it was true then the loop would have jumped on.

            $t .= $G{ $_ . $x } . $G{ $y . $_ } for @A;

Finally we start to actually determine what numbers could go into this square. The $t string is appended with the numbers that are already taken (t is for taken). This is done for both the current row and the current column in one statement.

            for $f ( 1 .. 3 ) { $t .= $G{ c($y) + $f . c($x) + $_ } for 1 .. 3 }

This double loop appends the numbers that are taken in the current block onto $t. There are two variables floating around, $f and $_ which both run from 1..3. These are added to the return value from the c subroutine discussed earlier which results in the squares in the current grid being selected.

The funny use of two for loops in this way is to save space.

            G( $G{$c} = $_ ) && return for grep $t !~ m/$_/, @A;

There are several things going on here. There is a loop over the possible number that could go into the square: for grep $t !~ m/$_/, @A. As $t contains the numbers that are taken the grep only returns the numbers from 1 to 9 that are not taken, ie possible values.

For each possible number G( $G{$c} = $_ ) && return is executed. This sets the current square to the possible value. It then recurses into G. If the return value is true then it returns. In fact it returns true as unless specifically given a value return will return whatever is in $_, which as it happens was just set to the return value from G. This will always be true for the return to get called and so true is always returned.

You might wonder why the current square is assigned its value as an argument to the subroutine G? The answer is that it saves one character doing it this way. Compare G( $G{$c} = $_ ) to $G{$c} = $_, G(). As G does nothing with arguments it is passed this is fine.

            return $G{$c} = 0;
        }
    }

If the code did not return above then it should now. If we reach this stage then every possible number for this square was tried and none of them led to a solution of the sudoku. We want to set the current square back to false and the allow the code to try changing some previous numbers to see if they lead to a solution. Again the return value from an assignment is used as a space saver.

    die map { $G{$_} } 9 .. 99;
}

This is dirty. I needed to shave off a couple of characters to get into four lines and so I did this. If we reach this point then the two for loops for $x and $y have reached the end. What has has happened is that the final square has been assigned a number. The subroutine has recursed and the two for loops have run and next has been called for every square as they all have numbers in them. Hence we end up here. To save those last few characters I used die instead of print. Horrible.

There is no need in the the map to worry about the values in G for keys such as 10 or 20 that have no values. They simply result in undefs being printed which produces no errors as use warnings has not been specified.

G

So here we are at the end of the code. All we need to do now is run G for the first time.

Rather interestingly I have spotted a space saver writing this post. There is a variable $p that was set to false but never actually gets used. It used to be used for possible values that could go into the square but became unneeded due to changes. However I forgot it in there. Bug fixing by code reading in action.

Another space saver is that as the code dies when a solution is found there is no need to return as the code would already have died. This means that another line can be shortened:

            G( $G{$c} = $_ ) && return for grep $t !~ m/$_/, @A; # before
            G( $G{$c} = $_ ) for grep $t !~ m/$_/, @A;           # after

This means that the die can be tidied up a bit and we can even format the output:

    die map { $G{$_} } 9 .. 99;             # before
    die map { $G{$_} || "\n" } 9 .. 100;    # after

The new, better, shorter obfu:

$a=8;$G{int(++$a/9).$a%9+1}=$_ for split//,<>;@A=1..9;sub c{int(($_[0]-1
)/3)*3}sub G{for$y(@A){for$x(@A){$t=$G{my$c=$y.$x}&&next;$t.=$G{$_.$x}.
$G{$y.$_}for@A;for$f(1..3){$t.=$G{c($y)+$f.c($x)+$_}for 1..3}G($G{$c}=$_
)for grep$t!~m/$_/,@A;return$G{$c}=0}}die map{$G{$_}||"\n"}9..100}G

I hope that you will agree that even though there are only four lines of code here there is plenty to think about. Not all of it is 'good' code, in fact most of it is very bad code. However the artificial constraint of four lines leads to some interesting techniques that take advantages of some of the more esoteric parts of the language.