The Evolution of Programming: Darwin, Genetics, and Mutant Programs

Business Intelligence
Typography
  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

Survival of the fittest: That is the axiom upon which Charles Darwin based his theory of evolution. In this article, I will explain how genetic algorithms use the same principle to allow computer programs to quickly solve problems that have many potential solutions. Even with today’s computing power, many problems still have too many possible solutions to solve them by using the brute-force technique of testing every one, but genetic algorithms allow programs to evolve gradually toward the best solution without having to do that. Because not every solution is tried, the solution returned by a genetic algorithm is not always the optimal solution, but, for some types of problems, a genetic algorithm will return a very good solution in a reasonable amount of time.

Let me put into perspective just how important genetic algorithms are. Artificial intelligence (AI) is a computing field that has been widely studied but rarely exploited; state-of-the-art AI systems that take advantage of today’s computing power are just beginning to take hold. One AI system, IBM’s Deep Blue, recently received a lot of attention by landing the Mars Pathfinder and performing the seemingly impossible computing task of beating chess master Garry Kasparov. Deep Blue, which runs on the AS/400’s sister system (the RS/6000), is capable of selectively analyzing nearly 100 billion moves per minute, but future systems, many of which will use genetic algorithms, will certainly eclipse its accomplishments and capabilities and will be used for everything from medical research and resource management to analyzing financial models and picking stocks.

A Little Background

For many years, computer scientists have been developing techniques that allow computer programs to exhibit intelligent behavior and even to emulate living creatures. My own interest in genetic algorithms began after reading an article by John Holland in the July 1992 issue of Scientific American entitled “Genetic Algorithms.” Holland, a mathematician and psychologist, was one of the early pioneers in AI. In the 1940s, he began researching some of the concepts that have become the building blocks of evolutionary computing. He started his research with neural networks and, in the 1950s after reading The Genetic


Theory of Natural Selection by evolutionary biologist R. A. Fisher, became interested in the application of evolution to computing. Holland observed that biological systems are able to solve very complex problems, such as adapting to different environments, by repeatedly combining, mutating, and selecting the fittest individuals from each resulting population. He found that this process is an efficient method of adapting individuals to solve problems presented to them.

Searching Through Space

Genetic algorithms are best suited to a class of problems known as nondeterministic- polynomial-time-complete, or NP-complete, problems. Solutions to NP-complete problems are generated using some form of nondeterminism (i.e., trial and error) and theoretically can be produced in exponential time as long as the solution can be verified in polynomial time. This means that the amount of time it takes to locate the optimal solution to an NP- complete problem is directly related to the number of possible solutions and the amount of time it takes to verify each potential solution. Another feature of NP-complete problems is that the algorithm used to solve one problem can be used to solve any other NP-complete problem.

Potential solutions to a problem are the problem’s search space. The best solution to a problem is located at the extreme minimum or extreme maximum point in the search space. In addition, for a problem to be solved using a genetic algorithm, the search space cannot be totally random. Solutions need to be arranged so similar solutions are grouped together. When local extremes exist in the search space, it is not possible to evolve directly to the best solution; some random solutions have to be tested.

Here are a few of the classic problems effectively solved using genetic algorithms:

• The traveling salesman problem, in which the shortest route between cities is determined

• Packaging problems, in which the optimal way to load items into a container is determined

• Scheduling problems, in which the most effective use of multiple resources is determined

The reason these problems are solvable using genetic algorithms is that they are optimization problems with solutions that can be arrived at by guessing. The fitness of any solution can also be easily determined and compared with the fitness of other potential solutions.

When optimizing these problems, the “optimal” solution is not required; a “very good” solution will do. You can find many examples of genetic algorithms that solve the problems listed above by searching the Internet. (The Web pages listed in the References and Related Materials section of this article contain some examples.)

It’s in the Genes

A genetic algorithm solves problems by simulating nature; it encodes problems by using chromosomes. A chromosome consists of groups of genes or DNA that control the various traits of an organism; a gene’s location within a chromosome is its locus. A gene or set of genes determines an individual trait. The trait’s current setting is the allele for that trait and is determined by the value of that trait’s gene.

The potential solution being evaluated for a problem is that problem’s population. To determine the relative fitness of a particular member of a population, a fitness test is applied. Recombination is the process of combining members of the population to create a new population. Elitism moves several of the fittest solutions directly to the new population without recombination. During the creation of the new population, random mutations


ensure that the population explores the entire search space so the population is not stuck in a local extreme.

Genetic Problem Solving

Before solving a problem using a genetic algorithm, an encoding scheme representing potential solutions to the problem is developed. One of the most popular encoding schemes uses binary values to represent traits. With binary encoding, traits are either true or false. Binary encoding is very efficient, but some problems do not lend themselves to this type of encoding. Other popular encoding schemes are permutation encoding and value encoding. Permutation encoding uses a sequential value for each trait and is useful for sequencing problems (such as the traveling salesman problem); value encoding uses the trait’s actual value.

During recombination, traits from each parent form a new solution. One of the most common ways of selecting traits to take from each parent is to choose a random point on the chromosomes of the parents. Traits of one parent are used up to the crossover point, and traits from the other parent are used after the crossover point. Another method is to use multiple crossover points.

After determining how to encode and recombine solutions, the first step in solving a problem using a genetic algorithm is to generate an initial random population of potential solutions. Next, you evaluate the relative fitness of the solutions and recombine the fittest individuals to create a new population. During recombination, a few random mutations are applied, and new generations are generated and evaluated until the desired level of fitness is achieved.

Do You Know the Way to San Jose?

Genetic algorithms work particularly well on sequencing problems. The traveling salesman problem is a sequencing problem that involves identifying the shortest route or one of the shortest routes between a number of cities. For this example, permutation encoding assigns each city on a 16-city route a number between 1 and 16. With 16 cities, there are over 20 trillion possible routes. A crossover point is selected, and the route of one parent is used up to this point. After the crossover point, cities are selected from the second parent if they are not yet used.

To solve the problem, an initial population of 20 potential routes is generated. The second step is to apply the fitness test. (In this case, the length of the route determines fitness.) To evolve toward the best solution, several members of the population containing the best traits (i.e., the shortest overall route) are recombined to form a new population of 18 routes. After crossover, a few members of the resulting population exchange cities to mutate the result. Two elite members of the prior population are moved directly to the new population, which is used to repeat this process either until a route that meets the desired criteria is identified or until progress stalls.

Mutant Musings

Genetic algorithms provide an adaptive means for solving problems by modeling natural, evolutionary systems. Computers do not have the ability to reason or apply abstract concepts when solving problems. However, by applying the theory of evolution and genetics to computing, John Holland was able to identify a new method for computers to solve seemingly intractable problems—a method similar to how Darwin theorized species evolved.

The financial industry uses genetic algorithms to pick stocks and identify potential investments, shipping companies use genetic algorithms to pack and route shipments, and universities use genetic algorithms to schedule classes. It is likely that you also have a similar optimization problem in your business. These are all problems that would be


difficult to solve by conventional means, but genetic algorithms provide the tools that you need to solve them.

References and Related Materials

• Applets for Neural Networks and Artificial Life: www.aist.go.jp/NIBH/~b0616/Lab/Links.html
• “Genetic Algorithms,” John H. Holland, Scientific American, July 1992
• Introduction to Genetic Algorithms: cs.felk.cvut.cz/~xobitko/ga


BLOG COMMENTS POWERED BY DISQUS