This Genetic Algorithm Tutorial Explains what are Genetic Algorithms and their role in Machine Learning in detail:
In the Previous tutorial, we learned about Artificial Neural Network Models – Multilayer Perceptron, Backpropagation, Radial Bias & Kohonen Self Organising Maps including their architecture.
We will focus on Genetic Algorithms that came way before than Neural Networks, but now GA has been taken over by NN. Although, GA still has applications in real life such as optimization problems like scheduling, playing games, robotics, hardware design, traveling salesman problems, etc.
Genetic Algorithms are algorithms that are based on the evolutionary idea of natural selection and genetics. GAs are adaptive heuristic search algorithms i.e. the algorithms follow an iterative pattern that changes with time. It is a type of reinforcement learning where the feedback is necessary without telling the correct path to follow. The feedback can either be positive or negative.
=> Read Through The Complete Machine Learning Training Series
Table of Contents:
Why Use Genetic Algorithms
GAs are more robust algorithms that can be used for various optimization problems. These algorithms do not deviate easily in the presence of noise, unlike other AI algorithms. GAs can be used in the search for large space or multimodal space.
Biological Background Of Genetic Algorithms
Genetics is derived from the Greek word, “genesis” that means to grow. The genetics decides the heredity factors, resemblances, and differences between the offsprings in the process of evolution. Genetic Algorithms are also derived from natural evolution.
Some Terminologies In A Biological Chromosome
- Chromosomes: All genetic information of a species is stored chromosomes.
- Genes: The Chromosomes are divided into several parts called Genes.
- Alleles: The genes identify the characteristic of an individual. The possibility of a combination of genes to form property is called Alleles. A gene can have different alleles.
- Gene Pool: All possible combinations of genes that are all alleles in a population pool is called gene pool.
- Genome: The set of genes of a species is called a genome.
- Locus: Each gene has a position in a genome that is called locus.
- Genotype: A full combination of genes in an individual is called the genotype.
- Phenotype: A set of genotypes in a decoded form is called the phenotype.
What Are Genetic Algorithms
The Genetic Algorithms stimulate the process as in natural systems for evolution. Charles Darwin stated the theory of evolution that in natural evolution, biological beings evolve according to the principle of “survival of the fittest”. The GA search is designed to encourage the theory of “survival of the fittest”.
The GAs perform a random search to solve optimization problems. The GA uses techniques that use the previous historical information to direct their search towards optimization in the new search space.
Correlation Of A Chromosome With GA
The human body has chromosomes that are made of genes. A set of all genes of a specific species is called the genome. In living beings, the genomes are stored in various chromosomes while in GA all genes are stored in the same chromosome.
Comparison between Natural Evolution and Genetic Algorithm Terminology
Natural Evolution | Genetic Algorithm |
---|---|
Chromosome | String |
Gene | Feature |
Allele | Feature Value |
Genotype | Coded String |
Phenotype | Decoded Structure |
Important Terminology In GA
- Population: It is a group of individuals. The population includes the number of individuals being tested, search space information, and the phenotype parameters. Generally, the population is randomly initialized.
- Individuals: Individuals are a single solution in population. An individual has a set of parameters called genes. Genes combined to form chromosomes.
- Genes: Genes are building blocks of Genetic Algorithms. A chromosome is made up of genes. The genes may determine the solution to the problem. They are represented by a bit (0 or 1) string of random length.
- Fitness: The fitness tells the value of the problem’s phenotype. The fitness function tells how close the solution is to the optimal solution. Fitness function is determined in many ways such as the sum of all parameters related to the problem – Euclidean distance, etc. There is no rule to evaluate fitness function.
A Simple Genetic Algorithm
A simple GA has a population of individual chromosomes. These chromosomes represent possible solutions. Reproduction operators are applied over these sets of chromosomes to perform mutations and recombination. Thus, it is important to find appropriate reproduction operators as GA’s behavior is dependent on it.
A simple genetic algorithm is as follows:
#1) Start with the population created randomly.
#2) Calculate the fitness function of each chromosome.
#3) Repeat the steps till n offsprings are created. The offsprings are created as shown below.
- Select a pair of chromosomes from the population.
- Crossover the pair with probability pc to form offsprings.
- Mutate the crossover with probability pm.
#4) Replace the original population with the new population and go to step 2.
Let’s see the steps followed in this iteration process. The initial population of chromosomes is generated. The initial population should contain enough genes so that any solution can be generated. The first pool of population is generated randomly.
- Selection: The best set of genes is selected depending on the fitness function. The string with the best fitness function is chosen.
- Reproduction: New offsprings are generated by recombination and mutation.
- Evaluation: The new chromosomes generated are evaluated for their fitness.
- Replacement: In this step, the old population is replaced with the newly generated population.
Roulette Wheel Selection Method
Roulette wheel selection is the Selection method that is used widely.
The selection process is short as shown below:
In this method, a linear search is made through the roulette wheel. The slots in the wheel are weighed according to the individual fitness value. The target value is set randomly according to the proportion of the sum of the fitness in the population.
The population is then searched until the target value is reached. This method does not guarantee the fittest of the individuals but has a probability of being the fittest.
Let’s see the steps involved in the Roulette Wheel Selection.
The expected value of an individual = Individual fitness/fitness of the population. The wheel slots are assigned to individuals based on their fitness. The wheel is rotated N times where N is the total number of individuals in the population. When one rotation is over, the selected individual is put into a pool of parents.
- Let the total expected value of a number of individuals in the population be S.
- Repeat steps 3-5 n times.
- Select an integer s between 0 and S.
- Loop through individuals in the population, sum the expected values until the sum is greater than s.
- The individual whose expected value puts the sum over the limit s is selected.
Drawbacks of Roulette Wheel Selection:
- If the fitness differs very much then the circumference of the Roulette wheel will be taken maximum by the highest fitness function chromosome, so the others have very little chance to be selected.
- It is noisy.
- The evolution depends upon the variance in the fitness of the population.
Other Selection Methods
There are many other selection methods used in the “Selection” step of the Genetic Algorithm.
We will discuss the 2 other widely used methods:
#1) Rank Selection: In this method, every chromosome is given a fitness value from ranking. The worst fitness is 1 and the best fitness is N. It is a slow convergence method. In this method, diversity is preserved leading to a successful search.
Potential parents are selected and then a tournament is held to decide which of the individuals will be a parent.
#2) Tournament Selection: In this method, a selective pressure strategy is applied to the population. The best individual is the one with the highest fitness. This individual is the winner of the tournament competition among Nu individuals in the population.
The tournament population along with the winner is again added in the pool to generate new offsprings. The difference in the fitness values of the winner and individuals in the mating pool provides a selective pressure for optimum results.
Crossover
It is a process of taking 2 individuals and producing a child from them. The reproduction process after selection makes clones of the good stings. The crossover operator is applied over the strings to produce a better offspring.
The implementation of the crossover operator is as follows:
- Two individuals are selected randomly from the population to produce offsprings.
- A cross-site is selected at random along the length of the string.
- The values at the site are swapped.
The crossover performed can be a single-point crossover, two-point crossover, multipoint crossover, etc. The single point crossover has one crossover site while a two-point crossover site has 2 sites where the values are swapped.
We can see this in the example below:
Single Point Crossover
Two-point crossover
Crossover Probability
Pc, crossover probability is the term that describes how often the crossover will be performed. 0% probability means the new chromosomes will be an exact copy of the old chromosomes while 100% probability means that all new chromosomes are made by crossover.
Mutation
Mutation is done after Crossover. While crossover focuses only on the current solution, the mutation operation searches the whole search space. This method is to recover the lost genetic information and to distribute the genetic information.
This operator helps to maintain genetic diversity in the population. It helps to prevent local minima and prevents generating useless solutions from any population.
The mutation is performed in many ways such as inverting the value of each gene with a small probability or perform mutation only if it improves the quality of the solution.
Some of the ways of mutation are:
- Flipping: Changing from 0 to 1 or 1 to 0.
- Interchanging: Two random positions are chosen, and the values are interchanged.
- Reversing: Random position is chosen and the bits next to it are reversed.
Let’s see some examples of each:
Flipping
Interchanging
Reversing
Mutation Probability
Pm, mutation probability is a term that decides how often the chromosomes will be mutated. If mutation probability is 100% then it means that the whole chromosome is changed. If a mutation is not performed, then new offspring are generated directly after crossover.
An Example of a general genetic algorithm Mutation Probability: Pm, mutation probability is a term that decides how often the chromosomes will be mutated. If mutation probability is 100% then it means that the whole chromosome is changed.
If a mutation is not performed, then the new offspring are generated directly after crossover. The initial population of chromosomes is given as A, B, C, D. The population size is 4.
The fitness function is taken as a number of 1’s in the string.
Chromosome | Fitness |
---|---|
A: 00000110 | 2 |
B: 11101110 | 6 |
C: 00100000 | 1 |
D: 00110100 | 3 |
The sum of fitness is 12 which implies, the average fitness function would be ~ 12/4 = 3
Crossover probability= 0.7
Mutation probability= 0.001
#1) If B and C are selected, the crossover is not performed as the fitness value of C is below the average fitness.
#2) B is mutated => B:11101110 -> B’: 01101110 to preserve population diversity
#3) B and D are selected, the crossover is performed.
B: 11101110 E:10110100 –> D:00110100 F:01101110
#4) If E is mutated, then
E: 10110100 -> E’: 10110000
Corresponding Chromosomes are shown in the below table. The order is put randomly.
Chromosome | Fitness |
---|---|
A: 01101110 | 5 |
B: 00100000 | 1 |
C: 10110000 | 3 |
D: 01101110 | 5 |
Though the fittest individual with fitness value of 6 is lost the overall average fitness of the population increases and would be: 14/4 =3.5
When To Stop Genetic Algorithm
A genetic algorithm is stopped when some conditions listed below are met:
#1) Best Individual Convergence: When the minimum fitness level drops below the convergence value, the algorithm is stopped. It leads to faster convergence.
#2) Worst Individual Convergence: When the least fit individuals in the population attain minimum fitness value below the convergence, then the algorithm is stopped. In this method, the minimum fitness standard is maintained in the population. It means that the best individual is not guaranteed but minimum fitness value individuals will be present.
#3) Sum of fitness: In this method, if the sum of fitness is less than or equal to convergence value then the search is stopped. It guarantees that all the population is within the fitness range.
#4 ) Median Fitness: In this method, at least half of the individuals in the population will be better than or equal to convergence value.
Some convergence criterion or stopping condition can be:
- When a specified number of generations have evolved.
- When the specified time to run the algorithm has been met.
- When the fitness value of the population does not change further with iterations.
Advantages & Disadvantages Of Genetic Algorithm
Advantages of a Genetic Algorithm are:
- It has a wider solution space.
- It is easier to discover the global optimum.
- Parallelism: Multiple GAs can run together using the same CPU without interfering with each other. They run parallelly in isolation.
Limitations of GA:
- The fitness function identification is a limitation.
- The convergence of the algorithms can be too fast or too slow.
- There is a limitation of selecting the parameters such as crossover, mutation probability, size of population etc.
Applications Of Genetic Algorithms
GA is effective to solve high dimensional problems. A GA is effectively used when the search space is very large, there are no mathematical problem-solving techniques available and other traditional search algorithms do not work.
Some applications where GA is used:
- Optimization Problem: One of the best examples of the optimization problems is the travel salesman problem which uses GA. Other optimization problems such as job scheduling, sound quality optimization GAs are widely used.
- Immune system model: GAs are used to model various aspects of the immune system for individual gene and multi-gene families during evolutionary time.
- Machine Learning: GAs have been used to solve problem-related to classification, prediction, create rules for learning and classification.
Conclusion
Genetic Algorithms are based on the method of natural evolution. These algorithms are different from the other classification algorithms as they use encoded parameters (0 or 1), there are multiple numbers of individuals in a population and they use the probabilistic method for convergence.
With the process of crossover and mutation, the GAs converge at successive generations. The execution of a GA algorithm does not guarantee a successful solution always. The Genetic Algorithms are highly efficient in optimization – job scheduling problems.
Hope this tutorial would have enriched your knowledge on the concept of Genetic Algorithms!!
=> Visit Here For The Exclusive Machine Learning Series