Jun 03

Pentominoes, a simple educational toy, can reveal some surprisingly deep computer science. Although the first back-tracking solution to the puzzle was implemented in 1958, Donald Knuth devised an entirely new way of looking at the problem in 2000, and called the algorithm the Dancing Links algorithm (or more usually, DLX).
A pentomino set consists of 12 pieces, representing: all the different ways of joining 5 squares (hence pent-) along their edges to form an individual piece.

Be thankful you’re not assembling a set of 3D blocks. It’s even harder than the 2D version.

Figure 1 below shows the complete set, together with each piece’s usual name, a letter of the alphabet that most closely resembles it (F, I, L, N, P, T, U, V, W, X, Y, and Z). Since there are 12 pieces, each 5 squares in size, for a total of 60 squares, that means you could investigate solutions to fitting them all in a 3×20 rectangle, 4×15 rectangle, a 5×12 rectangle, or a 6×10 rectangle, as well as other, more fanciful, shapes.

Figure 1: The 12 pentominoes.

Back in my early teens, I was given a plastic set of Pentominoes arranged in a 6×10 box, presumably for my birthday or Christmas. I seem to remember then spending some inordinate amount of time over several months trying to find all the ways of arranging the pieces in the box. There are 2339 different ways, not counting rotations or reflections. I even used a notebook with squared paper to record the permutations I did discover (although I still have that pentomino set, the notebook has been lost in the mists of time). Needless to say, I did not find all 2339 ways.

Starting solutions

Figure 2 shows a solution for the 6×10 box (in fact, the solution I carefully kept with the box, since not knowing how to put them back would have been embarrassing), and if you look carefully, you’ll see that you can immediately generate another by swapping over T and F.

Figure 2: One of several possible solutions to the 6×10 pentomino puzzle.

After playing around with the pieces for a while, the natural question to ask is – well, natural if you are a programmer, that is – how can I write a program to generate all the solutions to a particular puzzle? Let’s look at what might be needed for a 3×20 rectangular board (there are 2 solutions to this particular puzzle, not counting rotations and reflections).

A first idea might be the following algorithm: take the pieces in order, F to Z. Place F somewhere on the empty board and mark off the squares it covers. Place the I on the board, making sure that it does not cover any of the already covered squares, and mark off the squares it does cover. Place the L likewise. At some point you will find that you won’t be able to place the next piece, so you should backtrack and move one of the already placed pieces to another position (perhaps also by rotating or flipping it over), and then try again. Continue this process, with the realization that you might have to backtrack several times at any one check, and eventually you will discover all permutations.

The big problem with this particular solution is that it will take some time. There are 144 different places and orientations for the F in a 3×20 board, 48 for the I, 136 for the L, and so on, and many of them will be completely illegal (for example, the F cannot appear right at the ends of the board, since in doing so you will always leave at least one empty square). So, in all you’ll be trying out 144 * 48 * 136 * … different permutations. Not very fast at all.

A better implementation might be to switch it around a little. Consider the square at the upper left of the board. Select a piece that covers it, and then mark off the remaining 4 squares that are also covered by this piece. Find the next uncovered square (for this type of board, because it’s so narrow, it’ll be better to search down the narrow axis rather than the long axis). Try and find a piece that covers that and that can be placed legally. If not, backtrack again and change a piece you’ve already placed. Sometimes you might have to backtrack several steps.

Because you are more likely to identify impossible solutions much quicker (for example, there are only 2 possible ways to place the F to cover the top left square and both are shown to be illegal when considering what to place in the next square downwards), you’ll find that this algorithm is much more efficient at identifying valid solutions because you reject invalid positions much earlier.

If you try and implement either of the above solutions, you’ll possibly find yourself calculating the positions and which squares are covered by each of the individual pieces over and over again. It would be better to calculate them once and for all and put the values in some kind of table that you read and process as you try out various placements.

Before we discuss the best data structure for that table, let’s take an aside and consider what’s known as the Exact Cover Problem. Imagine the following scenario. You are given a matrix of ones and zeros; that is, whose cells are either 1 or 0. The first part of Figure 3 shows such a matrix. The problem you have to solve is to work out which rows of the matrix form a set that contain exactly one 1 in each column. There may be no such set of rows, there may be exactly one, or there may be more than one. A solution to this problem is known as an exact cover: a set of rows that exactly covers the columns. Take a moment to work out the exact cover for Figure 3.

Figure 3: An exact cover problem in part 1; partial steps to a solution in parts 2 and 3.

Despite the ease with which a visual inspection will provide the answer with this small matrix (rows 2, 3, and 5), it’s well known that this problem is extremely hard (it’s NP-complete, in computer science terms) the more rows and columns there are.
The algorithm for solving it is recursive. Let’s follow along with Figure 3. You’re given a matrix. Choose a column (say the first). Choose a row such that the cell at the intersection of the column and row is a 1 (let’s go for row 4). Add that row to our solution. Now since that row will have a 1 in column 1 we must remove all the other rows that have a 1 in column 1 (that would be row 5). Not only that, but we must also remove all the rows that have a 1 in the same column as this chosen row. For row 4 we hit the jackpot: We can get rid of row 5 (again), row 2 and row 6. We remove row 4 (since it’s in our possible solution) and the three columns that row 4 covered (there’s no need to recheck them since we’ve already covered them). Our matrix now looks like the second part of Figure 3.

Reduced matrix

We do the same thing with this reduced matrix. We choose row 3 as our row since it intersects with column 2 in a 1. Unfortunately, we then have to delete row 1 as well since there’s a clash in column 6, meaning that columns 5 and 7 never get covered and we failed to get an exact cover. So we backtrack. Maybe choosing row 3 was the mistake. However, there is no other possibility to cover column 2 in that particular reduced matrix, and so we must backtrack to our choice of row 4, way at the beginning. Since that choice led to no exact cover, we see if we can’t make another choice of row to cover column 1. Yes, there is: row 5. Selecting row 5 means we get rid of rows 4, 4 (again), and 1, as well as columns 1, 3, and 7 (since row 5 covers them). We’re left with the reduced matrix shown in part 3 of Figure 3.

Here we make the choice of row 3, which removes row 6, as well as columns 2 and 6. And this, as I’m sure you can see, leaves us with row 2 covering the remaining two columns. (To be pedantic, we select row 2 which removes the final two columns as well. Since there are no rows or columns left, we found a solution.) Hence our exact cover solution is rows 2, 3, and 5.

All fine and dandy but what does this exact cover problem have to do with pentominoes, and in particular defining the table of where a piece can be placed and which squares it covers? The answer is to set up the puzzle as an exact cover problem. Let’s define the columns first: the initial 12 columns will be the pentomino pieces, one column for each piece. The next 60 columns will be one per square from the board, say counting from top left across the top row (1–20), then the second row (21–40), then the final row (41–60).

Now we can define each row as being a single piece placed on the board in a particular position. Since it’s a single piece, we’ll have a 1 in only one of the first 12 columns, according to which piece it is. Now we put a 1 in each column of the next 60 that corresponds to the squares covered by the piece in that position. For example, if we take F in its “normal” orientation and place it abutting the left side, the row will have a 1 under the F column, and a 1 under columns for squares 2, 3, 21, 22, 42. All the other 66 columns for that row will be 0. (In fact, every row we add to the matrix will contain 6 1′s and 66 0′s.)

How many rows will there be? Well, we just sum all the different ways we can place each piece. For example, for F there are 18 positions for each of its orientations, and 8 orientations, making 144 distinct positions. I did the calculation and came up with the following number of positions for each piece: F 144; I 48; L 136; N 68; P 220; T 72; U 110; V 72; W 72; X 18; Y 136; Z 72. In other words, the puzzle matrix has 1168 rows. This is one huge matrix, but note that it is very sparse: only 1 in 12 cells is a 1. Now all we have to do is to find the exact covers for this 72×1168 matrix. Because we haven’t done anything to avoid solutions that would be rotations or reflections of other solutions, we’ll actually find 4 times as many solutions as expected.

The fun problem is how to actually solve the exact cover problem efficiently. I’ve already noted that it’s a non-polynomial problem (probably exponential), so, for some large matrix, it’ll all get very hard to nigh on impossible, but this particular sparse, not-that-large matrix actually can be solved pretty quickly.

The algorithm to do it is the one I mentioned above (Donald Knuth calls this algorithm, Algorithm X), however, keeping track of all the rows and columns you’re removing or adding back – the housekeeping, if you like ¬– is quite complicated. To aid in this housekeeping issue, Knuth came up with a refinement that he called Algorithm DLX for something like “Dancing Links implementation of Algorithm X” (you can get the paper from this link, it’s paper P159).

Dancing Links?

Consider a doubly linked list where each node has a Next and a Prev reference to the next and previous nodes in the list. Then, in order to delete a given node X, all we have to do is set X.Next.Prev to X.Prev and X.Prev.Next to X.Next. X will no longer be in the linked list. However, providing that we don’t alter X in any way, we can just as easily add it back into the list: set both X.Next.Prev and X.Prev.Next to X. In other words, although the linked list has no more references to X, X itself still has “knowledge” of where it was in the list.

Well, all that is very well, but so what? Consider our 72×1168 sparse matrix, how would we represent that in memory? What data structure would we use? We could just use a simple standard matrix of boolean values, so it would take 84,096 cells, but such a structure is too rigid. Better would be to use a linked list of cells containing a 1 along each row. Knuth went even further: not only are the 1 cells joined in a linked list along each row, but they are equivalently joined up and down in a linked list as well. The matrix becomes a tapestry of linked lists, with each 1 cell having a link to its left, right, up, and down neighboring 1 cells. 0 cells are not stored at all.

Figure 4: A linked list tapestry for the exact cover problem deomstarted in Figure 3.

Figure 4 shows the exact cover problem matrix that we saw in Figure 3 as a linked list matrix. Each column has a header cell and the header cells have their own single header cell. The linked lists are circular lists: the links off to the left join at the right (and vice versa) and the links off the bottom join at the top (and vice versa). Now that we have this woven data structure of linked lists, we can make use of the remove/add-back-again idea that I just introduced. To cover a column we unlink it from the column headers linked list using our code and we can easily unlink the rows it contains by the same method. If we need to backtrack, we can add the rows back, in reverse order, and finally add the column back in. The housekeeping of Algorithm X is done through this property of deleted nodes “knowing” where they came from. We don’t have to have any special additional data structure to hold the removed rows and columns: the nodes themselves keep track of all that automatically.

In essence, Algorithm DLX finds an exact cover if, during a recursive call, the header of headers node points to itself. It fails to find one if there are still column headers but they have no rows; we shall then need to backtrack. To Knuth, looking at all the breaking and making of links, the weaving of the lists, it reminded him of a dance, hence the term Dancing Links.

Apr 02

Thinkgeek – We always love Thinkgeek’s selection, not least because of the effort they go to building their props, and the fact that usually, they’re really good ideas. This year’s include the Lost-themed Dharma Initiative alarm clock, My First Bacon, the Screaming Knife, and programmable tattoos.

Google Translate For Animals – Mooo. Bark bark bark. Meow*. Meow? Meow! Neigh. Snap snap snappity snap. Waah. Pikachu!

Abduction Insurance – The worrying thing is that we can see a lot of people wanting to sign up for this one. Tinfoil’s not as cheap as it used to be.

Step Outside, Posh Boy – The Guardian reveals Labour’s new poster campaign. Not actually a bad idea. For a while, this was top of the trending topics, which is more than you can say for the official slogans.

TextP – Watch any YouTube video in text mode. Just snip the bit off the end of the URL and paste it on the end of other videos and enjoy them in fine Matrix-Vision.

BattleNet Neural Interface – Just wait, Ubisoft will have this protection system ready for the next version of Splinter Cell.

Google’s New Name – Pity it wasn’t updated on the actual page any time we checked, although it might just not have been working for us.

Log Into XKCD – Take a poke around behind the scenes of the popular comic. Lots of funny hidden jokes in there. Try treating it like a text adventure.

Tentacular, Tentacular! – Ars Technica throws away the idea of doing a joke news post and writes its own Choose Your Own Adventure instead. Become a booth babe! Be captured by Google! Join the Church of Scientology! It’s all there!

Simon Singh Wins Legal Battle - Actually, no, that happened. Hurrah for rational thought and common sense finally prevailing.

(* Croak)

Oct 07

Networks are one of the most explored structures in computer science because they have applicability in many different scenarios. A number of algorithms have been devised for these structures; one of the most entertaining results is the so-called Bacon number: a measure of how close a particular actor is to Kevin Bacon in the world of film.

This comes from the popular trivia game that has long been played in pubs around this great nation: the ‘Six Degrees of Kevin Bacon’. Here’s how it works: someone names an actor and then everyone has to try and work out the number of steps (‘degrees of separation’) between that actor and Kevin Bacon. The result is the Bacon number. Everyone who has worked directly in a movie with Kevin Bacon has a Bacon number of 1 (Kevin himself is assumed to have a Bacon number of 0). Those who have worked with one of those people will have a Bacon number of 2, and so on.

So, for example, suppose someone says ‘John Thaw’. John Thaw never appeared in a movie with Kevin Bacon, but he did appear in Chaplin alongside Diane Lane. In turn, she appeared in My Dog Skip with, you guessed it, Kevin Bacon. Hence John Thaw has a Bacon number of 2. (There may be other links between John Thaw and Kevin Bacon of degree 2, but the prize goes to the first person to suggest the shortest link, and hence the smallest Bacon number).

There’s a great website called The Oracle of Bacon that periodically downloads data about movies and their casts from the Internet Movie Database to refresh its own database. From this data it builds a big map of actors and movies and can answer these kinds of degrees of separation questions. (Quick aside: what’s the shortest link between Billie Piper and Sean Connery? Apparently she had a bit part in Evita alongside Mark Ryan, who was in First Knight with Connery. A ‘Connery number’ of 2 for Billie Piper, then.)

It’s a small world

In mathematics there’s a similar concept known as the Erdös number, which is named after the prolific Hungarian mathematician Paul Erdös. He published a massive number of papers with many collaborators across many disciplines. The Erdös number for an academic is the number of steps between Erdös and themselves, via collaborators on papers. So Paul Erdös would have an Erdös number of 0, all of his collaborators on the various papers he wrote would have an Erdös number of 1, and all their collaborators on other papers would have an Erdös number of 2, and so on.

All of these degrees-of-separation-type numbers are based upon the small world phenomenon. This is the observation that the social networks of large groups of people in modern society are very interconnected. Typically, they’re distinguished by short path lengths between any two nodes.

The psychologist Stanley Milgram did some research on these social networks during the ’60s and found that, on average (and using the methodology he described), any person in the US could trace a path to any other person in the US using between five and six links. Hence the term ‘six degrees of separation’, although Milgram did not use this particular phrase (it was first widely used as the name of a play and then a film – the star of the movie was Will Smith, who has a Bacon number of 2).

The small world phenomenon is of course widely used nowadays by the social networking sites such as Facebook and MySpace, although it’s best known as part of LinkedIn, where you can view your network separation from another person directly.

Meet the neighbours

So how are these degrees of separation worked out? How does the Oracle of Bacon calculate the shortest number of links (the path) between actor A and actor B?

Graph theory holds the answer. Not graphs in the sense of those diagrams you had to draw at school to show y=x2, but computer science graphs, or networks. The algorithm used is a ‘shortest path’ algorithm.

First, we need some terminology. I’ve already let slip a few pieces of jargon, so let’s be more rigorous. A network is a data structure consisting of nodes (sometimes known as vertices) and the edges that connect the nodes. You can most easily represent a network as a matrix, though this is inefficient in terms of memory. Each row and column represents nodes, and the intersection of a row and column will be 1 (or ‘true’) if there’s an edge between the nodes represented by the row and column, and 0 (or ‘false’) otherwise. Sometimes the edges have a ‘weight’ associated with them, which is the value in each cell. (An example of this is a network of towns, where the weight of the edge for two towns would be the length of the most direct route between them.)

A path is a set of edges using which we can travel from one given node to another. The length of the path is either the number of edges we have to travel to follow the path, or the sum of the weights of the edges. The shortest path is obviously the path with the smallest ‘length’.

Suppose we have a network, as shown in Figure 1 above. We want to find the shortest path from A to B. We’ll first consider the simple case where the path length is merely the count of the number of edges we follow.

We’ll need to store some information for each vertex we visit, so we should first create an array that stores references to all the nodes in the network. We’ll get to the information we need to store in a moment, as we describe the algorithm. We’ll also need an implementation of a ‘queue’ to hold nodes we’re about to visit.

Our algorithm uses a technique called ‘breadth-first’ search. In it, we visit all the neighbours for a vertex before visiting the neighbours themselves, and so on. It’s rather as if the search ‘fans out’ from the original node. The alternative search technique is known as ‘depth-first’ search, where we explore the nodes by following edges as far as we can go, and then backtrack to follow the edges that we missed.

Back to the breadth-first search. Set the first node’s ‘path length’ to 0. That’s the first value we have to store with each node in our array (obviously, the first node is at distance 0 from itself). Add the starting node to the queue. Now, in a loop, continue removing nodes from the queue until we find the target node. For each node we pick off the queue, mark it as ‘visited’ (to do this we’ll need a second value – a true/false indicator – and all nodes should be marked as unvisited at the start). Then add all the nodes immediately reachable from that node to the queue.

However, there’s a quick test here: there’s no point in adding nodes that have already been visited, since if they are being visited again, the new path length must be longer than before. Also, when you add the neighbour nodes to the queue, set their path length to one more than the current node’s path length.

You stop when you pick off the target node from the queue. Its path length value will be the shortest distance from the source node to the target node.

Weights and measures

If you also want to output the path taken from A to B, you should record the ‘parent’ of each node as well as its path length when you queue the node (the parent is the node from which you’re following the edge). Once you reach B you can follow the path back to A by following the parent links. (If you want the path in the right order, merely push the nodes on to a stack as you go back to A, and then you can pop them off in the forward order.)

Figure 2, above, shows this algorithm in action by colouring the edges visited at each step and the contents of the queue at each stage. Step 5 is the interesting one, since the edge marked in blue/grey is not followed (because the node at the end of it has already been visited).

For our application to Bacon numbers, the network may be huge (lots of nodes – sorry, actors – with each actor having many links to other actors), but this algorithm will suffice.

For the other kind of network where each edge has a weight associated with it, the algorithm changes slightly, although it is still based on the idea of breadth-first search. Here, we’re going to find the path that has the smallest cost (that is, the smallest total weight). We could find, for instance, that such a path has more edges than a strict ‘shortest’ path would have. This variant on the shortest path algorithm was first devised by Edsger Dijkstra (the computer scientist who published the well-known paper called ‘Go To Statement Considered Harmful’ in 1968 that lambasted the then common and prolific use of the Goto statement in programs).

The big assumption here is that all the weights are positive numbers. The reason for this stipulation is that if an edge has a negative weight, and that edge forms part of a cycle of negative length, you could travel round the cycle ad infinitum to produce shorter and shorter (more and more negative) paths.

Instead of an ordinary queue, Dijkstra’s Algorithm makes use of a priority queue. A priority queue is usually described as a queue to hold items from which you always pick off the item with the highest priority (and is so used in operating system job queues or printer queues), but can apply to any set of items from which you want to pick off the smallest or the largest. We’ll use the variant, where the next item you remove from the queue at each stage will be the smallest.

We start off as before: set the path length (or cost) to 0 for the first node, and add it to the priority queue. Now we loop over the queue, picking off nodes until we find the target node (and we will pick off the smallest node at each step). For each node we remove, however, we do some extra work. As before, we have to mark nodes as visited or unvisited (and obviously mark all nodes as unvisited at the start). When we remove a node from the queue, it is marked as visited.

For the current node, we look at each of the nodes reachable from it. If a neighbour node has been visited, we ignore it, much as we did before. For the other neighbouring nodes, we calculate the total cost to reach them from this one. Say the current node has cost 10, and the edge to another node has cost 4. The cost of the next node is therefore 14, through this particular node. Check whether this is less than the current cost of the next node (for this to work we shall have to set the cost of all the nodes – except the first – at the start to some very large number, usually called infinity). If it is, update the cost of the next node with this new smaller value and store the current node as its parent. Note that this will probably put the next node in a new position in the queue.

Completing the network

Once we remove the target node from the priority queue, we’ll have, as before, a path from it back to the original node, and we’ll also have the cost of that node as accumulated from the source node.

Figure 3, above, shows an application of Dijkstra’s algorithm to a simple network. Step 1 is the original network, and each of the edges is marked with its weight. Step 2, we start at A and find the smallest edge (the one with weight 1). That defines the next node to become our starting point for Step 3. Actually, for Step 4 we have two possible nodes of weight 3: the one in the middle and the very bottom one. For brevity, I chose the middle one so that the algorithm completes with Step 4 at B. The shortest path is the one marked.