Genetic algorithms are inspired from the natural process of evolution. We will have two running examples. First, a general example of trying to solve a maze. And second, a more specific example of trying to create good blackjack players. The phenomenon is broken down into the six following sections:
Although we will insert code snippets in all sections belowm if you wish to have a high level overview of how the
genetic blackjack algorithm we implemented works, visit our GitHub!
We also highly recommend watching this video (especially 6:50 to about minute ~10).
Here's a good medium post to familiarise yourself with the concepts if you aren't satisfied with our definitely much better
explanations below: one and
two.
A chromosome is a single member of the population. In natural terms, think of this as a single lion, tiger or bear. Oh my!
In algorithmic terms, think of a chromosome as a solution to the problem you are trying to solve. For example, if you are
trying to solve a maze, a chromosome could be a sequence of steps from the centre of the maze to the maze exit. Often, the
problem will have a feature vector Φ(x). If this is the case, the chromosome will be the weight vector, w, that corresponds to that
feature vector. Calculating the dot product w·Φ(x) gives a number. This dot product can now be used to make decisions.
To explain these, let's begin our blackjack example. Imagine each chromosome is a blackjack player. We'll simplify the game
so that the players are not playing against a dealer or other players but against the deck. In this case we can define the feature
vector for each player to be [score, numAs, num2s, ..., num10s, numJs, numQs, numKs] and the weight vector (chromosome) to be
[w_1, ..., w_n]. We choose to normalise these weights to 1 where each w_i is between 0 and 1.
If the dot product between features and weights, w·Φ(x), is closer to 1 than to 0, you choose to quit. Conversely,
you can choose to hit and draw another card.
In our code, this is how we defined a chromosome - which we named Player:
class Player():
def __init__(self, number=0, startProbs=None, isDealer=False):
self.cards = []
self.score = 0
self.playerNumber = number
self.isQuit = False
self.isBust = False
self.is21 = False
self.__featureVector = defaultdict(int)
self.__weights = self.initWeights()
This is an easy one. A generation is a particular set of chromosomes. You can think of it as the population of all players in the current iteration of the algorithm. Computationally, this will normally be represented as a set or list of individuals (vectors). In our blackjack example, we begin the algorithm by creating the list of random players, the first generation of our possible solutions, as the code snipper below demonstrates:
def initializeGeneration(self, n):
self.players = [Player(i) for i in range(n)]
As first described by Darwin, natural selection is an essential part of the evolutional process of a species. It will determine which individuals
perform better - whatever that might be - given certain conditions, which we know popularly as "survival of the fittest". What it means to perform better
is totally open-ended and means different things for different populations in distinct ecosystems. For a polar bear, the ability to hunt seals and to disguise
in white ice is more important than its ability to climb trees, which might be key for other species of bear that live in forests.
In our model, we will assign a score to the performance of each chromosome in a generation to assess how well they performed under the problem constraints.
The formula for the score should be the same for all chromosomes across all generations. After determining the best performers, these will be selected to
be the parents of the next generation of individuals, a reproduction process that will involve later steps such as mutation and crossover.
Going back to blackjack, the score of each chromosome is analogous to the score in the game. The game naturally has a top score constraint (if one
has more than 21 points in one's hands then one goes bust), but we also introduced a bottom score constraint that is supposed to simulate the behavior of the
dealer. Once a player goes bust or does not obtain the minimum score, it won't be selected to be the parent of the next generation, while a player that won the game -
had a score higher than the bottom threshold and lower or equal to 21 - will likely be selected to be the root of the following generation,
passing on its genes (weights) to its "offspring".
In code, the selection function was implemented in this way (note that we are not explicitly selecting by score, but the score influences which individuals will be selected):
TODO Phil - talk about the maze example?
def select(self, threshhold):
# Returns parents of next generation by removing all:
# - bust players
# - players who got blackjack (pure luck, no genes)
# - players who got above a certain score (threshhold)
selection = []
for player in self.players:
if not player.isBust and player.score >= threshhold and not (len(player.cards)==2 and player.is21):
selection.append(player)
self.parents = selection
return selection
# return selection # the parents of the next gene
In nature, however, reproduction is not only about selecting the best performers and making them the parents of the next generations. There are non-deterministic natural
processes that interfere with that.
One of these processes is crossover, which happens when two chromosomes tangle during cell meiosis and end up exchanging corresponding chunks of genes.
On a genetic algorithm model, we simulate this by calculating, for each pair of parents, a random index that will determine the chunk of genes that will
be exchanged. For instance, if we have parent_1 = [0,1,2,3,4,5] and parent_2 = [6,7,8,9,10,11], and we calculate a random index of 3, then all the numbers
after index 3 in both parents will be exchanged, resulting in parent_1 = [0,1,2,3,10,11] and parent_2 = [6,7,8,9,4,5].
In both blackjack and maze problems, the crossover happens exactly as described above. If it's still unclear how crossover works, refer to the image below,
which provides a visual aid to this process:
And in case the image wasn't too clear, here's the code snippet in which we run the crossover (the function is long, but not hard to parse through):
def crossssover(self, childrenNum):
self.genNum += 1
children = []
numParents = len(self.parents)
numParents = numParents if numParents % 2 == 0 else numParents - 1
if numParents < 2: pass #init randomly?
# print(f"Number of parents: {numParents}")
# check even & odd
for i in range(0,numParents,2):
if i >= childrenNum: break
parentWeights0 = self.parents[i].getWeights()
parentWeights1 = self.parents[i+1].getWeights()
crossoverIndex = randint(0, len(parentWeights0) - 1)
child0Weights = {}
child1Weights = {}
j = 0
for weight in parentWeights0:
if j < crossoverIndex:
child0Weights[weight] = parentWeights0[weight]
child1Weights[weight] = parentWeights1[weight]
else:
child0Weights[weight] = parentWeights1[weight]
child1Weights[weight] = parentWeights0[weight]
j+=1
child0 = Player((childrenNum*self.genNum)+i)
child1 = Player((childrenNum*self.genNum)+i+1)
child0.updateWeights(child0Weights)
child1.updateWeights(child1Weights)
children += [child0, child1]
As we all learned in Middle/High School, when organisms reproduce, it is never guaranteed that the offspring's genome will be purely a combination of
the parent's genes. Most of the times, new individuals are subject to mutations, which happen seemingly randomly. A mutation basically on an indiividual's
genome would mean that some new genes not present in the parents' pool of genes are introduced into the individual's genome.
For our genetic algorithm, we simulated such effect by going through every chromosome in the new generation and generating two random numbers: one
will designate which gene in the chromosome will be changed and the second one will be the value of that gene's replacement.
Let's check out how that is done in code:
def mutate(self):
for player in self.players:
weights = player.getWeights()
weightKey = sample(set(weights),1)
newWeights = weights
newWeights[weightKey[0]] = random()
player.updateWeights(newWeights)
Now that we have all the essential components in our mind, let'stake a look at how the algorithm itself works.
In the following code snippet, which will call all the previous functions we discussed here, you'll notice there is one extra function we have not discussed yet: casino.
The casino function basically runs an iteration of the game of the current generation. Therefore, it is easy to see in the snippet below that the algorithm
consists of initializing a random generation, then running the casino once, and then for all the remaining generations we will select the best performers,
cross their genes over, run the mutations, and then run the casino on the newly born generation, until the last generation. We could set an early stop when
the algorithm converges, but for the purposes of this educational example, we decided to keep it simple :) Hope you enjoyed learning about Genetic Blackjack!
def main():
for threshhold in range(17,22):
# print(f'---------\nGeneration {0}')
genModel = geneticModel.GeneticModel()
genModel.initializeGeneration(PLAYERSPERGEN)
blackjackGenetic.casino(DECKNUM, genModel.players)
print(f"random score: {genModel.getAvgScore()}")
# print(genModel.getAvgScore())
for i in range(1, NUMGENERATIONS):
# print(f'---------\nGeneration {i}')
genModel.select(threshhold)
genModel.crossover(PLAYERSPERGEN)
genModel.mutate()
blackjackGenetic.casino(DECKNUM, genModel.players)
# print(genModel.getAvgScore())
# print(genModel)
print(f"threshhold: {threshhold} --> score: {genModel.getAvgScore()}")
In case you were curious: here's the casino function!
def casino(deckNumber, players):
deck = Deck(deckNumber) # creates deckNumber number of card decks
for player in players: # iterate through all the players
table(deck, [player]) # assign decks and a player to each table, and run the game
def table(deck, players):
"""Currently only one player allowed per game."""
# Deal
for player in players:
player.dealCards(deck)
# Decisions
while(not all([player.isBust or player.isQuit or player.is21 for player in players])):
for player in players:
if player.isBust or player.isQuit or player.is21: continue
if player.getPrediction():
player.hit(deck)
else:
player.quit()
No. This is a very fun idea to explore, but it boils down to having some very basic selection criteria and very naive optimisation/updating methods. It's only marginally better than a random search. If you ever see a weight vector in the wild, please don't write a genetic algorithm for it. That is, unless you are looking for a way out and can't afford the flight to Switzerland. We have much better ways to do this now-a-days, and we encourage you to study techniques such as Q-learning. Sorry, Greta Thunberg, but here it's Humans 1 - Nature 0.
Swarm algorithms are also inspired by behaviours seen in nature. They work on the principle that the interactions between many simple agents can produce complex behaviour. Ant colonies served as the initial inspiration for the creators of the first such algorithms. While ants are almost completely blind, when interacting with each other they are somehow able to find the shortest path to a food source. We call this behaviour "emergent behaviour", which encapsulates the idea that the whole is worth more than the sum of its parts. In this explanation, we will focus on the Ant-Quantity algorithm put forward in this 1991/2 paper: Distributed Optimization by Ant Colonies by Colorni, A., Dorigo, M., Maniezzo, V., Varela, F.J., & Bourgine, P. (1992).
In the outside world, ants leave pheromone trails as they walk. At first, ants will search for food almost randomly, leaving trails of pheronome like tiny versions of Hansel and Gretel. Other ants following are much more likely to take paths covered in pheromone than paths that aren't. After some time and lots of ants, they almost magically find the shortest path to food.
Look at the drawing below, particularly at Fig. 1 b). Imagine you are an ant starting at A and trying to reach a food source at E. At B
you suddenly bump into a food source. Being nearly blind, you do not know whether it it better to turn right to C or left ot H. The best
you can do now is turn right with probability 0.5 and left with probability 0.5. Let's split you into two ants now - one that chose to
go right (Ant A) and the other one that chose to go right (Ant B). Additionally, imagine it takes 1 second to walk the route BCD and 10
seconds to walk the route BHD.
Ant A takes the route BCD, taking 1 second. Ant B takes the route BHD, taking 10 seconds. It then reaches the food source at E and walks
back to D. To simplify the explanation, let us say that this only took 5 seconds meaning that ant B has not reached D (this isn't neccesary
but it will avoid more complicated maths and still captures the idea). Ant A can now go on its own pheromone trail or on a pheromone-less
trail. Ants are much more likely to chose trails with pheromone so it will chose the former, taking the route DCB and then back to A.
Now play this with more ants and for more time. Path BCD will gradually build up more and more pheromone due its being shorter than BHD.
This in turn will encourage more ants to take path BCD, which has more pheromone, rather than path BHD, which has less.
Additionally, pheromone evaporates causing paths that are not frequently reinforced to disappear. Eventually, the trail of ants will
look like the drawing in Fig. 1 c).
The steps of the algorithm are as follows:
We can model the picture above as a grid made up of tiles. At the beginning, each tile will have no pheromone on it. There will
also be a starting point, a food source and potentially several obstacles in the way. We can imagine that while the algorithm
is being run, each tile will accumulate pheromone on it as ants traverse paths that include that tile. For this step, we do not
consider how the amount of pheromone is updates on each tile. Instead we imagine we are an ant and consider how we decide what
path to take.
We decide which path to take as a function of two things:
Once the ants have found solutions you update the amounnt of pheromone on each tile based on the number of traversals
that include that tile on that iteration. You then add that to the amount that was already on that tile multiplied by
an evapouration factor, ρ. You can see this update calculation in the equation:
where
Repeat until you converge on a satisfactory answer.
While, like genetic algorithms, it is a fun idea to make swarm algorithms, they are not advised for use in serious applications. There are much better shortest path algorithms, such as A*, which are much more highly recommended. Greta, I'm afraid you're going to need a bigger boat. Humans 2 - Nature 0.