TIPS: General theory on AI and pattern recognition

The Financial Ad
Trader
Get FAT!


1. Artificial Neural Networks.

Artificial neural networks (ANN's) allow the user to build predictive models of systems based upon historical example, assuming that the system is not largely random. The key distinction between conventional modeling and the use ANN's is that the model builds itself to a great degree, that is, the system implements adaptive responses to example data which results in a model that fits the data set that it was exposed to, or 'trained' on. A trained ANN may, if it has been developed with a data set comprising most of the examples of the system's states, 'generalize' to predict outcomes based upon examples that it has never seen.

This ability to generalize is perhaps no more mystical than the ability of a function, F(x), constructed of N pairs of data, to predict well given a value (x) that was not used in the construction of the function: the only difference is the degree of complexity of the function approximation.

The Financial Ad
Trader
Get FAT!

A linear or nonliner 'regression' (error mimimization) routine can often be used to arrive at an appoximation for a set of data points in two dimensions (x,y), but how does one do an 30 dimensional (a1, a2, a3, .....a30) regression to a function where each point in space is defined with 30 input values? With 90? With 1000? Even stable, noise-free systems can have a large number of conditions that define them.

ANN's can 'regress' to, or arrive at, a solution to multivariable functions. The theory of the final state is not too difficult to contemplate. An ANN in its final trained state would have an electronic equivalent of an n-dimensional function.

What does an ANN look like? Before it is trained it consists of a set of software simulated electonic nodes or neurons that are all interconnected with one another by simulated pathways. The nodes can take on a range of values represented by a mathematically continuous and smooth function, typically a sigmoidal function. The value of this function is determined at any point in time by the summation of the values coming into the node via the simulated pathways. The pathways can have different 'strengths' that determine their impact on the nodes that they feed into. Typically there are discrete 'layers' of nodes with first layer being the inputs or values of the system that you are trying to model. The second layer is sort of illdefined and fluid, often called the 'hidden' layer. The final layer is the output layer, an is the system output, or the variable one is trying to predict. So, there are three layers and two sets of connections, roughly speaking: Layer one-two connection and layer two-three connections. Connections may also be initiated or terminated so that a node may have many or few connections.

The Financial Ad
Trader
Get FAT!

What does it look like after it is trained? The same as the above, except that the system is no longer random. You see, before the ANN is exposed to or trained on any data all the input nodes are set at random values. All of the connection strengths are also randomized. The foregoing of course means that the intial state of the output layer must be .....random. The output is determined by the current input values in the nodes of layer one and the configuration of the connections between the layers. After it has been trained or 'de-randomized' the ANN has adjusted to the examples given it in the form of layer one inputs. The final state of the system is really just a function with many input variables that approximates the data set in the same way a linear regression approximates the data set it has been fit to.

Finally, one can mention that getting from randomness to the final system state is done with various mathematical error minimization routines, the commonest is called 'back propagation'. Essentially what happens is that each example of the system one is trying to model [(N inputs) times (M examples)] is fed to the ANN. The output is examined and compared to the known output one is trying to predict. The difference between the system output (the final layer) and the known output constitutes the error one is trying to minimize. This is then done by adjusting the stregth of the connections between the layers. The process is repeated with all the examples many times to arrive at the best set of connection strengths for ALL the data, not just a single example. This means that some examples will be modeled by the finished ANN better than others. This is referred to as the 'best fit' for the entire data set.

Now, if one has two variables, then the best fit is a straight line or perhaps a curve, if one has three then it is a line or curve, or a surface. If one has four variables then it is a four dimensional structure, or perhaps one can say a series of three dimensional lines or surfaces mathematically related to one another by the fourth variable. In any case, most people can't visualize much that is meaningful beyond three dimensions.

The point is that it doesn't take very many inputs or variables to expand the potential function space (or solution space, or n-dimensional space) to the point where finding the right set of relations between the inputs (constructing the function) is like finding a needle in a haystack. ANN's can do a better job than most people and do it quickly, but they can get stuck in 'local minima' or false soluions. That's where GENETIC ALGORITHMS come in.

2. Genetic Algorithms.

Most people have heard the terms mutation, natural selection, and evolution. These are three terms from the field of biology. They make it possible to explain the changes that have apparently taken place over very large time frames with regard to organisms on earth. In its simplest and strictest definition, evolution is the result of a series of random mutations to heritable genetic material (DNA) that are either passed on or not due to the advantages or dissadvantages that they confer to the organism that has been born with the mutation. Mutations occur in all of us all the time. Some mutations are passed on, depending on when and where they occurr. If they are passed on, they are often lethal and the offspring do not survive to reproduce. If they are not lethal then the mutations may hamper the function of the organism, reducing the probability of survival and hence reproduction and passing on of the mutation. Alternatively the reverse may be true. Mutations that increase the probability of survival stay around, and those that do not are lost with the death of the organism if it fails to reproduce. Fitness is the key.

Now, if the above ideas are used to modify ANN's one has the concept of genetic algorithms (GA's). GA's are just mathematical manipulations of ANN's by random, small changes that are then tested for the effect on the ANN's function. If it improves function it is kept, if not, they don't last. The GA's allow ANN's to change in a manner not possible using the error minimization routine mentioned above. In practice it allows one to skip over or around the false solutions that ANN's can get stuck in, and so get to the best fit for the data, rather than an inferior fit.


Home