Selection and Preparation of Training Data

Next: Modification of the neuron Up: BPN Training algorithm Previous: BPN Training algorithm

Selection and Preparation of Training Data

A neural network is useless if it only sees one example of a matching input/output pair. It cannot infer the characteristics of the input data for which you are looking for from only one example; rather, many examples are required. This is analogous to a child learning the difference between (say) different types of animals - the child will need to see several examples of each to be able to classify an arbitrary animal. If they are to successfully classify birds (as distinct from fish, reptiles etc.) they will need to see examples of sparrows, ducks, pelicans and others so that he or she can work out the common characteristics which distinguish a bird from other animals (such as feathers, beaks and so forth). It is also unlikely that a child would remember these differences after seeing them only once - many repetitions may be required until the information `sinks in'.

It is the same with neural networks. The best training procedure is to compile a wide range of examples (for more complex problems, more examples are required) which exhibit all the different characteristics you are interested in. It is important to select examples which do not have major dominant features which are of no interest to you, but are common to your input data anyway. One famous example is of the US Army `Artificial Intelligence' tank classifier. It was shown examples of Soviet tanks from many different distances and angles on a bright sunny day, and examples of US tanks on a cloudy day. Needless to say it was great at classifying weather, but not so good at picking out enemy tanks.

If possible, prior to training, add some noise or other randomness to your example (such as a random scaling factor). This helps to account for noise and natural variability in real data, and tends to produce a more reliable network.

If you are using a standard unscaled sigmoid node transfer function, please note that the desired output must never be set to exactly 0 or 1! The reason is simple: whatever the inputs, the outputs of the nodes in the hidden layer are restricted to between 0 and 1 (these values are the asymptotes of the function. To approach these values would require enormous weights and/or input values, and most importantly, they cannot be exceeded. By contrast, setting a desired output of (say) 0.9 allows the network to approach and ultimately reach this value from either side, or indeed to overshoot. This allows the network to converge relatively quickly. It is unlikely to ever converge if the desired outputs are set too high or too low.

Once again, it cannot be overemphasised: a neural network is only as good as the training data! Poor training data inevitably leads to an unreliable and unpredictable network.

Having selected an example, we then present it to the network and generate an output.

Next: Modification of the neuron Up: BPN Training algorithm Previous: BPN Training algorithm

Daniel Robert Franklin 2000-06-15