Bio-Inspired Algorithms

Phil & Raul aka Lil' Git and Yung Commit -m, not necessarily in that order.

Genetic Algorithms

Genetic algorithms are inspired from the natural process of evolution. We will have two running examples. First, a general example of trying to solve a maze. And second, a more specific example of trying to create good blackjack players. The phenomenon is broken down into the six following sections:

Chromosomes
Generation
Selection
Crossover
Mutation
Algorithm

Although we will insert code snippets in all sections belowm if you wish to have a high level overview of how the genetic blackjack algorithm we implemented works, visit our GitHub!

We also highly recommend watching this video (especially 6:50 to about minute ~10). Here's a good medium post to familiarise yourself with the concepts if you aren't satisfied with our definitely much better explanations below: one and two.

Chromosomes

A chromosome is a single member of the population. In natural terms, think of this as a single lion, tiger or bear. Oh my!

In algorithmic terms, think of a chromosome as a solution to the problem you are trying to solve. For example, if you are trying to solve a maze, a chromosome could be a sequence of steps from the centre of the maze to the maze exit. Often, the problem will have a feature vector Φ(x). If this is the case, the chromosome will be the weight vector, w, that corresponds to that feature vector. Calculating the dot product w·Φ(x) gives a number. This dot product can now be used to make decisions.

To explain these, let's begin our blackjack example. Imagine each chromosome is a blackjack player. We'll simplify the game so that the players are not playing against a dealer or other players but against the deck. In this case we can define the feature vector for each player to be [score, numAs, num2s, ..., num10s, numJs, numQs, numKs] and the weight vector (chromosome) to be [w_1, ..., w_n]. We choose to normalise these weights to 1 where each w_i is between 0 and 1.

If the dot product between features and weights, w·Φ(x), is closer to 1 than to 0, you choose to quit. Conversely, you can choose to hit and draw another card.

In our code, this is how we defined a chromosome - which we named Player:


class Player():
    def __init__(self, number=0, startProbs=None, isDealer=False):
      self.cards = []
      self.score = 0
      self.playerNumber = number
      self.isQuit = False
      self.isBust = False
      self.is21 = False
      self.__featureVector = defaultdict(int)
      self.__weights = self.initWeights()

Generation

This is an easy one. A generation is a particular set of chromosomes. You can think of it as the population of all players in the current iteration of the algorithm. Computationally, this will normally be represented as a set or list of individuals (vectors). In our blackjack example, we begin the algorithm by creating the list of random players, the first generation of our possible solutions, as the code snipper below demonstrates:


def initializeGeneration(self, n):
    self.players = [Player(i) for i in range(n)]

Selection

As first described by Darwin, natural selection is an essential part of the evolutional process of a species. It will determine which individuals perform better - whatever that might be - given certain conditions, which we know popularly as "survival of the fittest". What it means to perform better is totally open-ended and means different things for different populations in distinct ecosystems. For a polar bear, the ability to hunt seals and to disguise in white ice is more important than its ability to climb trees, which might be key for other species of bear that live in forests. In our model, we will assign a score to the performance of each chromosome in a generation to assess how well they performed under the problem constraints. The formula for the score should be the same for all chromosomes across all generations. After determining the best performers, these will be selected to be the parents of the next generation of individuals, a reproduction process that will involve later steps such as mutation and crossover.

Going back to blackjack, the score of each chromosome is analogous to the score in the game. The game naturally has a top score constraint (if one has more than 21 points in one's hands then one goes bust), but we also introduced a bottom score constraint that is supposed to simulate the behavior of the dealer. Once a player goes bust or does not obtain the minimum score, it won't be selected to be the parent of the next generation, while a player that won the game - had a score higher than the bottom threshold and lower or equal to 21 - will likely be selected to be the root of the following generation, passing on its genes (weights) to its "offspring".

In code, the selection function was implemented in this way (note that we are not explicitly selecting by score, but the score influences which individuals will be selected): TODO Phil - talk about the maze example?


def select(self, threshhold):
    # Returns parents of next generation by removing all:
    #  - bust players
    #  - players who got blackjack (pure luck, no genes)
    #  - players who got above a certain score (threshhold)
    selection = []
    for player in self.players:
        if not player.isBust and player.score >= threshhold and not (len(player.cards)==2 and player.is21):
            selection.append(player)
    self.parents = selection
    return selection
    # return selection # the parents of the next gene

Crossover

In nature, however, reproduction is not only about selecting the best performers and making them the parents of the next generations. There are non-deterministic natural processes that interfere with that.

One of these processes is crossover, which happens when two chromosomes tangle during cell meiosis and end up exchanging corresponding chunks of genes. On a genetic algorithm model, we simulate this by calculating, for each pair of parents, a random index that will determine the chunk of genes that will be exchanged. For instance, if we have parent_1 = [0,1,2,3,4,5] and parent_2 = [6,7,8,9,10,11], and we calculate a random index of 3, then all the numbers after index 3 in both parents will be exchanged, resulting in parent_1 = [0,1,2,3,10,11] and parent_2 = [6,7,8,9,4,5].

In both blackjack and maze problems, the crossover happens exactly as described above. If it's still unclear how crossover works, refer to the image below, which provides a visual aid to this process:

And in case the image wasn't too clear, here's the code snippet in which we run the crossover (the function is long, but not hard to parse through):


def crossssover(self, childrenNum):
    self.genNum += 1
    children = []
    numParents = len(self.parents)
    numParents = numParents if numParents % 2 == 0 else numParents - 1
    if numParents < 2: pass #init randomly?
    # print(f"Number of parents: {numParents}")
    # check even & odd
    for i in range(0,numParents,2):
        if i >= childrenNum: break
        parentWeights0 = self.parents[i].getWeights()
        parentWeights1 = self.parents[i+1].getWeights()
        crossoverIndex = randint(0, len(parentWeights0) - 1)
        
        child0Weights = {}
        child1Weights = {}
        j = 0
        for weight in parentWeights0:
            if j < crossoverIndex:
                child0Weights[weight] = parentWeights0[weight]
                child1Weights[weight] = parentWeights1[weight]
            else:
                child0Weights[weight] = parentWeights1[weight]
                child1Weights[weight] = parentWeights0[weight]
            j+=1
        child0 = Player((childrenNum*self.genNum)+i)
        child1 = Player((childrenNum*self.genNum)+i+1)
        child0.updateWeights(child0Weights)
        child1.updateWeights(child1Weights)
        children += [child0, child1]

Mutation

As we all learned in Middle/High School, when organisms reproduce, it is never guaranteed that the offspring's genome will be purely a combination of the parent's genes. Most of the times, new individuals are subject to mutations, which happen seemingly randomly. A mutation basically on an indiividual's genome would mean that some new genes not present in the parents' pool of genes are introduced into the individual's genome.

For our genetic algorithm, we simulated such effect by going through every chromosome in the new generation and generating two random numbers: one will designate which gene in the chromosome will be changed and the second one will be the value of that gene's replacement.

Let's check out how that is done in code:


def mutate(self):
    for player in self.players:
        weights = player.getWeights()
        weightKey = sample(set(weights),1)
        newWeights = weights
        newWeights[weightKey[0]] = random()
        player.updateWeights(newWeights)

The Algorithm

Now that we have all the essential components in our mind, let'stake a look at how the algorithm itself works.

In the following code snippet, which will call all the previous functions we discussed here, you'll notice there is one extra function we have not discussed yet: casino.

The casino function basically runs an iteration of the game of the current generation. Therefore, it is easy to see in the snippet below that the algorithm consists of initializing a random generation, then running the casino once, and then for all the remaining generations we will select the best performers, cross their genes over, run the mutations, and then run the casino on the newly born generation, until the last generation. We could set an early stop when the algorithm converges, but for the purposes of this educational example, we decided to keep it simple :) Hope you enjoyed learning about Genetic Blackjack!


def main():
    for threshhold in range(17,22):
        # print(f'---------\nGeneration {0}')
        genModel = geneticModel.GeneticModel()
        genModel.initializeGeneration(PLAYERSPERGEN)
        blackjackGenetic.casino(DECKNUM, genModel.players)
        print(f"random score: {genModel.getAvgScore()}")
        # print(genModel.getAvgScore())


        for i in range(1, NUMGENERATIONS):
            # print(f'---------\nGeneration {i}')
            genModel.select(threshhold)
            genModel.crossover(PLAYERSPERGEN)
            genModel.mutate()
            blackjackGenetic.casino(DECKNUM, genModel.players)
            # print(genModel.getAvgScore())

        # print(genModel)
        print(f"threshhold: {threshhold} --> score: {genModel.getAvgScore()}")

In case you were curious: here's the casino function!


def casino(deckNumber, players):
    deck = Deck(deckNumber) # creates deckNumber number of card decks
    for player in players: # iterate through all the players 
        table(deck, [player]) # assign decks and a player to each table, and run the game

def table(deck, players):
    """Currently only one player allowed per game."""
    # Deal
    for player in players:
        player.dealCards(deck)
    
    # Decisions
    while(not all([player.isBust or player.isQuit or player.is21 for player in players])):
        for player in players:
            if player.isBust or player.isQuit or player.is21: continue
            if player.getPrediction():
                player.hit(deck)
            else:
                player.quit()

Disclaimer: Is it good tho?

No. This is a very fun idea to explore, but it boils down to having some very basic selection criteria and very naive optimisation/updating methods. It's only marginally better than a random search. If you ever see a weight vector in the wild, please don't write a genetic algorithm for it. That is, unless you are looking for a way out and can't afford the flight to Switzerland. We have much better ways to do this now-a-days, and we encourage you to study techniques such as Q-learning. Sorry, Greta Thunberg, but here it's Humans 1 - Nature 0.

Swarm Algorithms

Swarm algorithms are also inspired by behaviours seen in nature. They work on the principle that the interactions between many simple agents can produce complex behaviour. Ant colonies served as the initial inspiration for the creators of the first such algorithms. While ants are almost completely blind, when interacting with each other they are somehow able to find the shortest path to a food source. We call this behaviour "emergent behaviour", which encapsulates the idea that the whole is worth more than the sum of its parts. In this explanation, we will focus on the Ant-Quantity algorithm put forward in this 1991/2 paper: Distributed Optimization by Ant Colonies by Colorni, A., Dorigo, M., Maniezzo, V., Varela, F.J., & Bourgine, P. (1992).

Antz

In the outside world, ants leave pheromone trails as they walk. At first, ants will search for food almost randomly, leaving trails of pheronome like tiny versions of Hansel and Gretel. Other ants following are much more likely to take paths covered in pheromone than paths that aren't. After some time and lots of ants, they almost magically find the shortest path to food.

Pheromone

Look at the drawing below, particularly at Fig. 1 b). Imagine you are an ant starting at A and trying to reach a food source at E. At B you suddenly bump into a food source. Being nearly blind, you do not know whether it it better to turn right to C or left ot H. The best you can do now is turn right with probability 0.5 and left with probability 0.5. Let's split you into two ants now - one that chose to go right (Ant A) and the other one that chose to go right (Ant B). Additionally, imagine it takes 1 second to walk the route BCD and 10 seconds to walk the route BHD.

Ant A takes the route BCD, taking 1 second. Ant B takes the route BHD, taking 10 seconds. It then reaches the food source at E and walks back to D. To simplify the explanation, let us say that this only took 5 seconds meaning that ant B has not reached D (this isn't neccesary but it will avoid more complicated maths and still captures the idea). Ant A can now go on its own pheromone trail or on a pheromone-less trail. Ants are much more likely to chose trails with pheromone so it will chose the former, taking the route DCB and then back to A.

Now play this with more ants and for more time. Path BCD will gradually build up more and more pheromone due its being shorter than BHD. This in turn will encourage more ants to take path BCD, which has more pheromone, rather than path BHD, which has less.

Additionally, pheromone evaporates causing paths that are not frequently reinforced to disappear. Eventually, the trail of ants will look like the drawing in Fig. 1 c).

Drawing of ants searching for food from original paper.

A. Colorni, M. Dorigo et V. Maniezzo, Distributed Optimization by Ant Colonies, actes de la première conférence européenne sur la vie artificielle, Paris, France, Elsevier Publishing, 134-142, 1991.

Algorithm (Ant-Quantity)

The steps of the algorithm are as follows:

Each agent explores the graph stochastically, finding a solution.
Update path pheromones.
Repeat until termination.

1. Explore graph stochastically, find solution

We can model the picture above as a grid made up of tiles. At the beginning, each tile will have no pheromone on it. There will also be a starting point, a food source and potentially several obstacles in the way. We can imagine that while the algorithm is being run, each tile will accumulate pheromone on it as ants traverse paths that include that tile. For this step, we do not consider how the amount of pheromone is updates on each tile. Instead we imagine we are an ant and consider how we decide what path to take.

We decide which path to take as a function of two things:

The amount of pheromone on the tiles, represented by τ_i,j.
The reciprocal of the manhattan distance between the food source and the ant, represented by η_i,j (called the visibility).

In the original formulation of the algorithm, there were also two other factors α and β, which we will consider constants to ignore for simplicity. The probability with which you choose to move to tile τ_i,j at the given time step is given by the equation:

Tile probability

, which essentially represents a normalised probability proportional to the multiple between the amount of pheromonne and visibility. Do this for all the ants until they reach a solution from home to food.

2. Update path pheromones

Once the ants have found solutions you update the amounnt of pheromone on each tile based on the number of traversals that include that tile on that iteration. You then add that to the amount that was already on that tile multiplied by an evapouration factor, ρ. You can see this update calculation in the equation:

Pheromone Update

where

Transition

3. Repeat until termination

Repeat until you converge on a satisfactory answer.

Another disclaimer

While, like genetic algorithms, it is a fun idea to make swarm algorithms, they are not advised for use in serious applications. There are much better shortest path algorithms, such as A*, which are much more highly recommended. Greta, I'm afraid you're going to need a bigger boat. Humans 2 - Nature 0.

References

Colorni, Alberto, Marco Dorigo, Vittorio Maniezzo, Francisco J. Varela and Paul Bourgine. “Distributed Optimization by Ant Colonies.” (1992).

A. Colorni, M. Dorigo et V. Maniezzo, Distributed Optimization by Ant Colonies, actes de la première conférence européenne sur la vie artificielle, Paris, France, Elsevier Publishing, 134-142, 1991.

Medium Article, link copied on Tue Dec 10, 2019

Medium Article, link copied on Tue Dec 10, 2019