Jun 14

Linux doesn’t have a CEO. Consequently, there’s no annual keynote hosted by a charismatic alpha male. But if it did, and if there were a conference covering the first half of this year, the first speech would start with three words: ‘Linux is winning’.

Firstly, a market research firm in the US called The NPD Group revealed that sales of Google’s Android platform overtook those of Apple’s iPhone in the first quarter of 2010, propelling itself into second place behind the waning RIM. Android is becoming increasingly competitive, spanning both the smartphone and the emerging tablet markets, with devices from Dell and HP on the near horizon. This might be why Apple has started a patent infringement lawsuit against HTC, using many of its Android-based phones as physical exhibits in its litigation.

Secondly, Google announced its intention to open source the VP8 video codec. This was acquired when it bought On2 earlier in the year and it will be used alongside Vorbis and the MKV container to create Google’s WebM video format. This is vitally important for Linux. The nascent H.264 format, as used by Apple and many HTML5 video streams, is encumbered by patents, and current open-source implementations live under the shadow of legislation. VP8 and WebM have the potential to match it for quality, and while WebM will undoubtedly attract similar litigious trouble, having an umbrella the size of Google should satisfy many Linux distributions, especially when Mozilla, Opera and Adobe have already pledged their support.

Finally, the UK’s new coalition government has published its Programme for Government. There are two points in the section on Transparency that are great news for free software. One states, “We will create a level playing field for open-source software,” while the other adds, “We will ensure that all data published by public bodies is published in an open and standardised format, so that it can be used easily and with minimal cost by third parties.” If these promises come true, it will transform attitudes to open-source software and Linux, and hopefully open the door for its use within government and schools, two areas where it’s ideal.

Many of us used to think that for Linux to be judged a success, it had to be installed and running on more desktop computers than Microsoft Windows. And there are great swathes of Linux users who still feel the same way. But the world of computing has changed. There’s more than one way of judging the success of something that started as just a good idea.

Windows, Linux and OS X are survivors. They’ve lasted this long because they exist within their own ecosystems. Linux, for example, is fed by a curious mixture of enterprise investment, embedded hardware vendors and a community brimming full of zealous commitment. There’s a low-cost threshold to entry and a subsystem that maintains itself with very little investment. It’s these factors that have shaped how it looks, how it feels and how it’s operated.

The ecosystems inhabited by both Microsoft and Apple are equally well-adapted to their environments. The former is the domain of the utilitarians, offering straight functionality for an up-front price. The latter is an increasingly important fusion of fashion and function. But things have changed. The borders between the ecosystems have become indistinct. Apple has surpassed Microsoft in market value, winning thousands of new fans through it’s no-fuss interfaces and lower prices. There’s a shift in the balance of power.

And thanks to Google, Linux is becoming less free and less open, proving that in the new markets where it’s having the most commercial success, it’s becoming more like Apple. ROMs are encrypted and need to be rooted for user-hacking, third-party applications have to be sold through a single vendor and personal information is held in the cloud by a sole provider. If Linux wants a taste of similar success, it might find it if it makes similar concessions to a user’s freedom.

But then we’d have failed. The Linux ecosystem would have become too polluted, bogged down by sponsored kernel additions, paid-for support and short life cycles. It may be a commercial success, but no longer an active one. Our hypothetical CEO might make further compromises, and make judgements against the interest of Linux users. Which is exactly why we don’t have a CEO, and exactly why the success of open-source software is so difficult to judge using the same language as its competitors.

May 11

When we need to select an algorithm for a particular purpose, we should pay attention to its runtime characteristics: how fast it is; how much memory it uses; whether there’s a worst case for the algorithm’s execution speed; and so on. All these answers are expressed with the big-Oh notation, which I’ll describe later.

A common abstract data structure that’s used all the time in programming is the dictionary or associative array, which is sometimes known as a map. I call it an abstract structure because it can be implemented in myriad different ways, but it always has a specific interface. We’ll use the dictionary to investigate the runtime efficiency of various algorithms that can be used to implement it.

OK, so it’s not this kind of dictionary. We’re referring to a digital one. An associative array.

But first, a definition: a dictionary is a structure that holds name-value pairs. A name-value pair is an object that has a name – that’s used both to describe its value and as a key to find it – and a value, which can be anything at all. The classic example is a real-world dictionary, where the name is a word and its value is the word’s definition. However, don’t limit yourself to assuming the name is always some kind of text string. In reality, names can be integer values, bit strings, 128-bit GUIDs, dates or anything at all. That said, it’s helpful to assume that they’re text strings for now.

The dictionary has various operations that define its external interface. There’s the ‘Create’ operation, which creates a new dictionary, and the ‘Destroy’ operation, which releases any resources the dictionary is using and destroys the structure. A dictionary can only be used after ‘Create’ has been called, and once ‘Destroy’ is executed, it no longer exists. Since these operations are only used once each per dictionary, they won’t have much effect on the overall runtime and so we won’t discuss them any further.

When given a name, ‘Find’ will search for the name-value pair that matches and return its value or an error if the name is not found. ‘Exists’ will do the same, except it will merely return true or false according to whether the name is present or not. Since they’re virtually identical, apart from what they return, we’ll ignore ‘Find’ from now on.

Finally we have ‘Insert’ and ‘Delete’, which do what you’d expect: add a new name-value pair to the dictionary (returning an error if the name already exists), and remove the name-value pair that matches a given name, respectively. In general, ‘Delete’ won’t return an error if the name is not found, and sometimes ‘Insert’ will merely replace the value if the name already exists.

Now that we have our abstract data structure, let’s investigate first how to implement it and second analyse the efficiency of our implementations. We’ll look at a total of four implementations.

Name-value pairs

The first implementation is the most obvious: use an array of name-value pairs. ‘Exists’ is the first operation to think about. In essence, to see whether the given name is present, you would check every pair in the dictionary sequentially and stop when you found it. If the given name isn’t present, you would compare the name of every name-value pair to the given name. The more pairs there are, the longer it would take, but you can be even more precise than that. Suppose there were N pairs in the dictionary and each comparison took the same (constant) length of time – say t. Then it would take tN time units to find out the given name wasn’t present. Another way of putting this is that the time taken for the nonexistence check is proportional to N. In computer science, without going into too much rigorous mathematics, we say the runtime efficiency is O(N), pronounced ‘big-Oh of N’, although you can read it as ‘is proportional to N’.

So if it took so many seconds to find out that a given name wasn’t in a dictionary of 1,000 pairs, it would take twice as long for a dictionary of 2,000 pairs, and 10 times as long for a dictionary of 10,000 pairs.

What if the given name was in the dictionary? What could we say then? Well, it could be that the matching pair was the first item checked. In that scenario, we say the best case efficiency for ‘Exists’ is O(1), which you read as ‘is constant’ (in other words, it doesn’t depend at all on the number of items in the dictionary). But, of course, for that to happen, you’d have to be extremely lucky. You could be completely unlucky and be looking for the final item. Here the worst case efficiency is O(N) – the time taken would be proportional to the number of items in the dictionary.

On average, though, if you searched for every name in the dictionary, the efficiency would be O(N/2). Now comes the fun bit with big-Oh notation: since it essentially means ‘is proportional to’, you can take the 1/2 (a constant) out of the parentheses into the implied proportionality constant and say that the efficiency is O(N). We say that searching through the dictionary-as-array is O(N): twice as many items, twice as long.

‘Insert’ is simple: we add the new name-value pair to the end of the array, a constant O(1) operation. Hold on there though – we first have to search the array to find out if the name is already present or not. ‘Insert’ then degenerates to O(N), just like ‘Exists’. We get no benefits at all from the constant, quick, add-it-to-the-end operation; we still have to search.

‘Delete’, as I’m sure you can see, is at least O(N) as well – we have to do the search. There’s something else about ‘Delete’ that we have to take into account: we have to physically remove the name-value pair from the array. The simplest way of doing this is to simply take the final pair in the array and put it in the slot vacated by the pair that was removed: a constant O(1) operation. So, overall, ‘Delete’ is O(N); the search time will swamp the move-an-item time.

Sorted pairs

Let’s move on to the second implementation. This one is again an array, except this time we maintain the pairs in sorted order. This has the assumed requirement that the names are sortable and that, given any two unequal names, we can say that the first is smaller or greater than the second.

We’ll start off by analysing ‘Exists’ again. The array is in sorted order, so we can use binary search to try and find the name-value pair that matches. With binary search, we look at the middle item in the array. If it’s the one we want, we stop. If the one we want is less than this middle item, we know that, if it’s present at all, it’ll be in the first half of the array. If the one we want is greater than the middle item, we know it will be in the second half. We repeat this process with the half array we selected. We’ll either find the item immediately again, or we’ll have reduced the number of items we have to search to a quarter of the array. Ditto the next step, except we reduce the space we have to search to an eighth of the original array. And so on.

Again, consider the doesn’t-exist case. Say we start out with an array with 1,023 items. After one step, we’ll have discarded one item and will have identified a subarray of 511 items for the next step. After this next step, we’ll have reduced the search space to 255 items, and so on. At the 10th step we’ll have a tiny array of just one item, which we can easily compare. So all in all, we’ll have made 10 comparisons to find out that the given name is not present. What’s so special about 10? Well, it’s the logarithm to base two of 1024 (that is, 2ˆ10 = 1024). Again, without being too rigorous mathematically, we say ‘Exists’ is O(logN) when the name isn’t present.

Think of O(logN) this way: if it takes a particular length of time to find out that a given name isn’t present in a sorted array of 1,000 items, it will only take twice as long for an array of 1,000,000 items. If you square the number of items, you double the time taken. This is an extremely significant result, showing the importance of binary search. What if the given name is present? We can make the same analysis as before: best case is O(1), worst case is going to be the same as not finding it: O(logN), and so we say that, overall, ‘Exists’ is O(logN).

What about ‘Insert’ and ‘Delete’? Again, we have to search for the name, so it would seem that they’re both O(logN). But this time, consider what we must do to add (or remove) the name-value pair. For ‘Insert’, we have to make a hole in the array to put the new pair in, shuffling all the items greater than it along by one. For ‘Delete’, we have to shuffle the remaining pairs to close up the hole vacated by the removed pair. If we’re lucky, in both cases, we don’t have to move any items (that is, best case is O(1)); if we’re unlucky we have to move all of the remaining pairs (that is, worst case is O(N)). On average, it’s O(N) for all the shuffling we need to do. Since O(N) is bigger than O(logN) – for very large values of N the (in)efficiency of the moving of the items will swamp the efficiency of the search – we ignore the smaller proportionality and just use the larger one. We say ‘Insert’ and ‘Delete’ are both O(N).

Hash table

Now for the next implementation: the hash table. Without going into full detail, we have an array as the basic data structure. Again, we analyse ‘Exists’ first. To find an item in a hash table, we hash the given name to produce an index into the array. The hash is produced by a randomising type function that takes the name, chops it up and combines the parts to produce an integer value. That integer value is then reduced to a possible array index value by use of the mod operator. The hash function is designed so that similar names produce very different hash values.

Best case is that ‘Exists’ is O(1). That is, we create the hash for the given name, convert it to an index, go to that element in the array, and the pair we need is there and matches. No matter how many items are in the array, that process is constant. (Actually, the hash function is usually O(k) where k is the length of the name, but we’re ignoring that for now.)

What about worst case? Well, in practice we’ll find that many names will hash to the same array index value. These are called collisions and we need to implement a collision resolution strategy to deal with them. The simplest is known as chaining, where we chain the name-value pairs as, say, a linked list at each array element. In this case, once we’ve calculated the index, we then do a sequential search through the chain at that index.

To ensure that the chain is never too long, hash tables grow themselves periodically when their load factor (the number of pairs present divided by the number of array elements) reaches a particular value. To do this, a new array is created, and all the pairs are rehashed and inserted into the new array. This ensures that chains never grow beyond a few items, say five or 10. Since this isn’t dependent on the total number of items, it’s still constant and we say ‘Exists’ in a hash table is O(1) on average.

‘Insert’ is a more difficult operation to analyse. On the face of it, it’s O(1) – both the ‘Search’ and ‘Add’ functions are constant time operations in general – but every now and then, a reorganisation will take place on an insertion operation. In general, hash tables are written such that they double in size when they grow. This is a O(N) operation, but we can amortise it over all previous insertion operations, so that, overall, ‘Insert’ remains O(1). Best case then is O(1), worst case is O(N), amortised case is O(1).

The same types of arguments can be made about ‘Delete’, although in general we tend not to shrink a hash table anywhere near as often as we make it bigger. ‘Delete’ is then O(1), meaning that the amortised use of a hash table over all its operations is O(1). There is, of course, still that warning that every now and then you will hit the O(N) worst case on an insertion.

Binary tree

The next data structure we can use is a balanced binary search tree, such as a red-black tree. This, like the sorted array version, makes the assumption that names can be sorted.

In a binary tree, the efficiency of search operations is O(d), where d is the maximum depth of the tree (the number of levels from the root of the tree to the furthest leaf). Since a perfectly balanced binary search tree is equivalent to binary search on a sorted array (every link you decide to follow will enable you to ignore a whole chunk of the tree), ‘Exists’ is on average O(logN). Best case is still O(1), but what about worst case? That depends on the algorithm used to balance the binary tree. Balancing is never perfect but, using red-black trees as an example, we can prove that they’re constructed such that the longest path is a maximum of twice the length of the shortest path. If you like, O(2logN). Since 2 is a constant, we can take it out, making red-black trees O(logN) in the worst case for ‘Exists’.

For ‘Insert’ and ‘Delete’, there’s a lot of mathematics that can prove that they’re both O(logN) as well. In essence, the search is O(logN), and the addition of the new node or removal of the old node is O(1) on average.

So, overall, a red-black tree is O(logN) in all its operations. Perhaps more importantly, it has guaranteed O(logN) time even in the worst case. This means that some people will prefer to use a red-black tree for their dictionary instead of a hash table because they don’t want to hit the possibility of O(N) insertion.

Figure 1: Graphing some common big-Oh expressions (O(N^2) is cut off so we can see the others).

From this discussion, you should now have a basic understanding of how to read and understand big-Oh expressions and how to evaluate algorithms and data structures based on them. Figure 1, above, illustrates the runtime for various common big-Oh expressions.

Radix trees

Radix trees offer a further data structure that can be used for a dictionary. A radix tree stores prefixes to keys rather than complete keys in its nodes, and each node can have many children. A key is then found as a complete path through the tree from root to leaf – at each step down the tree, you compare another small part of the name to the next node.

Figure 2, below, shows an example radix tree storing a small set of words. In searching for ‘hostess’, we follow the left link from the root, matching host, then follow the middle link matching the ‘e’ and finally matching the ‘ss’ in the right node.
Unlike the other data structures we’ve looked at, the efficiency of a radix tree doesn’t depend on the number of name-value pairs, but instead on the length of the keys. All operations are essentially O(k), where k is the maximum name length in the radix tree. This can be greater than the balanced binary tree’s O(logN), for example, but in practice we find that the comparisons needed in a binary tree are also significant, so the radix tree can be a viable alternative.

Figure 2: A small radix tree, using middle dot to indicate end of word.

Ternary trees

Back in issue 282, I cited ternary search trees as a strong candidate for the data structure behind a dictionary. Ternary trees, like radix trees, have a runtime efficiency that’s dictated by the length of the keys rather than their number, but are much easier to implement. Ternary search trees and radix trees also have a further benefit: using them means you can easily produce a sorted list of names in the dictionary, as well as produce a prefix list (a list of names with a particular prefix).

Profiling

All of the efficiency results quoted in this article are theoretical. They are all of the form ‘for large values of N the efficiency is proportional to some expression in N’, but make no mention of the size of the constant of proportionality. Therefore, when deciding on which data structure to use in your dictionary, you should profile actual code running on your actual data. It’s pointless worrying, for example, about the efficiency of millions of items in a dictionary when you’ll only have 100.

Mar 04

One of the most interesting avenues of computer science is that of programming a computer to play a game against a human opponent. Examples abound, with the most famous that of programming a computer to play chess. But no matter what the game is, the programming tends to follow an algorithm called minimax, with various attendant sub-algorithms in tow.

First, a definition: a two-player zero-sum game is one played between two players where the players play alternately, the whole game is visible to both and there’s a winner and a loser (or there’s a draw). It’s zero-sum because if the game is played for money, the loser pays the winner and overall there’s no loss of money. (A bit like energy in a reaction: no money is created or destroyed.)

Super-exploding Noughts and Crosses in Space, anyone? The rules are a little complex for this article, so we’ll stick to the standard variety.

One of the simplest two-player zero-sum games is noughts and crosses, where the players alternately place Xs and Os in a 3 x 3 grid, with the winner being the first player to place three of their symbol in a row, column or diagonal line. Like me, you probably played this as a child and, as you played it, you learned how to force a win or draw every time. In fact, once both players get that insight, every game is guaranteed to result in a draw. The only way to win is to play a novice player.

The algorithm

Analysing noughts and crosses with the minimax algorithm is pretty standard in game theory, so I’ll discuss a different game called Nim to illustrate minimax and its variants. Nim is interesting because it’s easily understood, fairly unfamiliar and simply modelled. Plus, there are no draws in Nim, so the whole winner/loser thing is much simpler: someone always wins. But who?

In Nim, the players face three piles of stones with, say, five stones in each pile. Each player takes it in turn to play by removing from a single pile anything from one stone to the entire pile. The loser is the one who is forced to remove the final stone from the final pile, leaving all three piles empty. (Another way of looking at it is that the winner is the first player to be faced with three empty piles.)

For example, suppose our two players are named Max and Minnie. Max starts (he always does, not being a gentleman) and decides to remove all the stones from pile one. Minnie then removes all but two stones from pile two. Max thinks for a while, then removes all but two stones from pile three. Minnie resigns, because no matter what she does, Max will win. (If she removes one stone from a pile, Max removes both stones from the other, and she’s left with the final stone. If she removes both stones from a pile, Max removes one stone from the other, leaving her with the final stone.)

Traversing nodes

Games such as Nim are modelled as game trees. You start off with the initial state of the game as a node, the root of the tree. From this node, each possible move is modelled as a link to another node, which stands in for another state or position of the game.

Figure 1: The first few levels in the noughts and crosses game tree.

So, for example, in noughts and crosses, the root node is the empty grid. Traditionally X starts and there are three possible moves: the centre, a corner and the middle cell along an edge (all the cells are equivalent to one of those three). So, the initial root node has three links to other game states. Each of those new nodes has different possible moves for O, as shown in Figure 1 above. You can imagine going further and drawing more levels.

Nim’s tree is more complex. The initial state has 15 possible links, corresponding to removing one, two, three, four or five stones from each of the three piles. Each of these 15 possible states of the game then has up to 14 possible links to other states for the second player, and so on. You can imagine that the number of game states (that is, nodes in the game tree) explodes pretty quickly.

If you happened to have a big enough piece of paper, it would be possible to map out the entire game tree for the version of Nim that I described. For the leaf nodes of the tree (that is, the nodes with no links coming out of them), you would be able to identify the loser of the game for the path taken through the tree to each particular leaf. Figure 2, below, shows a particularly daft path through the tree where the players take all the stones from each pile in turn (not exactly an insightful game, but nevertheless a possible one under the rules). The loser is Max, because he takes all the stones from pile three in the third move.

Figure 2: An allowable but idiotic game play for Nim, resulting in Max losing.

We can assign a value to each leaf node to indicate who wins (or loses). To make sure we don’t get completely confused, we assign a monetary value from the viewpoint of the first player, Max. Let’s say the winner of the path to the leaf receives £1 and the loser has to pay out that amount – so if the winner is Max, the value of the node is £1, while if the winner is Minnie, the value is -£1 (since Max has to pay that amount to her).

Player one

Let’s imagine that we set up the entire game tree from the viewpoint of Max, the player who makes his move first. Each game position corresponds to a node in the tree, and if you think about it, a whole level of the tree will correspond to a given player. So, the root of the tree is what Max is faced with at the very start of the game: five stones in each of the three piles, and 15 possible game positions to leave for Minnie. What does Max choose to play in this situation? What he should do is analyse all possible moves from the bottom up and assign a value to each node as he works his way up the tree, according to the amount he could win on that node if he played optimally.

Figure 3: A simple choice in a game tree, to calculate the minimax value of the root node.

Let’s take a look at a made-up example, shown in Figure 3 above. Here, the root node shows a game position from which Max must play. There are two possibilities: playing the left-hand option goes to a game position that he’s already worked out means he wins £1; playing the right-hand option goes to a game position where he loses £1. (Remember, all payouts are from Max’s viewpoint.) I don’t know about you, but I’d choose the first play. This means that the current game position also has a value of £1. For every game position where it’s his turn to play, Max would choose the option that would maximise his winnings. Minnie, who is just as perceptive as Max, would, of course, choose plays that would result in the best result for her and ignore all the others. So she would always choose a play that maximised her winnings, which, from Max’s perspective, means minimising his.

If you had the entire tree, you could work out a value for each node working from the bottom up. If it was a ‘Max node’ (that is, Max had to play from it), it would have a value that was the maximum of the child nodes. If it was a ‘Minnie node’ it would have a value that was the smallest (the minimum) of the child nodes. This, in essence, is the minimax algorithm: build the tree, work out the value of each node using an alternate minimise/maximise constraint, and the value of the root is the value of the entire game for player one (Max, as we called him).

The recursive method

Instead of building the entire tree and then analysing it, the best approach is to traverse the tree recursively (a postfix traversal, in fact) and calculate what you need when you need it (and destroy the stuff you don’t need when you’re done). In essence, since a tree is defined recursively, you calculate the minimax value by calculating the maximum (or minimum) of the minimaxes of all the child trees. Remember that the levels alternate between maximising and minimising (sometimes you look at it from Minnie’s viewpoint instead of Max’s).

Figure 4, below, shows a very simplified Nim game (one pile of five stones, you can remove one, two or three stones each play), fully expanded into a game tree. The number inside each node is the number of stones left in the pile after the move, and the letter alongside each node is the minimax value for Max (W = win, L = lose). Note that the value of the game is L – that is, Max will always lose (if you like, this simplified Nim is always a win for the second player).

Figure 4: The complete game tree for a simplified Nim game.

Although the minimax algorithm is always guaranteed to find the best play for Max, there is a big problem. The game tree can be huge – mind-bogglingly huge. Consider chess, the classic archetype of a two-player zero-sum game. At each game position there could be something like 30 possible moves. Since each chess game is made up of about 80 plays (40 back-and-forths), it would mean that the lowest level of the tree would have something like 10118 nodes. (Note that in tournaments it’s rare for a game to go to checkmate – the losing player is likely to resign well before then.) As a comparison, there are around 1080 atoms in the observable universe, meaning that, in essence, there’s no possible way for a computer to map the entire chess game tree. So what can we do?

The first optimisation is to limit the depth to which we evaluate the game tree using the minimax algorithm. Since we may not actually reach a leaf node in doing this, we make use of an approximation function – a heuristic – to approximate the value of the node or game position. Of necessity, this value is not going to be accurate, but it will enable us to apply the minimax algorithm without having to evaluate all the nodes down to the leaves. The better the heuristic, the better the chances of devising a winning game play and the more accurate our minimax values will be.

Limiting depth

In our recursive algorithm for minimax, we’ll need to limit the depth of the recursion instead of allowing the recursion to reach the leaves. The simplest way to do this is to pass a depth parameter to the recursive minimax function and decrement its value at every recursive call. At the lowest level of the recursion, we use the heuristic function to calculate the minimax value of the current game position.

Now, the resulting minimax value at the root of the game tree is only going to be an approximation. The deeper we allow the partial minimax algorithm to go, the more accurate its value will be (because we’re more likely to find leaf nodes in our traversal), but the longer the traversal will take. We have to strike a balance between accuracy and the time taken to calculate the minimax value (and hence the move to play).

Once it’s our turn again to make a move, we should recalculate the minimax value at our new game position, making it, in effect, the root of the current state of the game. Every move would be made after a new minimax calculation based on the current game state.

In many chess programs that run on standard PC hardware, the depth of the minimax search is limited to some six full-width levels – around a billion possible game positions. Any more than that and the time taken to analyse the game positions would be far too long to be practical. For example, analysing positions at a rate of a million per second, six full-width levels would take about a quarter of an hour.

Alpha-beta pruning

First proposed by John McCarthy at a conference in 1956 (although only named as such later on), alpha-beta pruning is a method for cutting off whole branches of the game tree so that they don’t have to be evaluated with minimax. In essence, the algorithm maintains two extra values during the minimax recursion: alpha and beta. Alpha is the minimum value for Max (biggest loss for him) and beta is the maximum value for Minnie (biggest win for Max). They start out as negative infinity for alpha and positive infinity for beta. As the minimax recursion proceeds, the value for alpha is replaced when a new minimax value that is larger is found (ditto for beta, when a smaller value is calculated). If they cross at any time, the branch of the tree currently being investigated is no good for either player and can be further ignored, or pruned. It can be shown that this algorithm doesn’t mistakenly prune branches that will benefit either player and so it’s widely used in minimax implementations.