![]() |
|
|||
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
|
![]() |
[an error occurred while processing this directive]
The problem of learning of a neural network under this mixture of training data splits naturally into two tasks:
Where x1 and x2 are two patterns coming from To fully explain the learning scheme, it is instructive to portray it using two copies of the same neural network; the training simultaneously modifies the two networks in a synchronous manner, Fig. 2.16.
Consider a single triple of the elements in the training set and The first component in the above formula can be specified once the similarity expression (sim) has been explicitly defined. 2.6. Generalization abilities of neural networksThe (potential) approximation abilities of neural networks are guaranteed by the fundamental approximation theorem. The approximation alone may not be sufficient to the network to perform equally well over the testing set. So far the approximation of training data was the only objective to be optimized. The error computed for the training data was minimized by modifying the connections of the network. To make the neural network useful, our expectations are that it should perform comparatively well on some data different from those used for the training purposes. To accomplish that, our intent should be to test the already trained neural network on some other testing data set. The error diminishes over the training set - the same may not be true for the testing set. As shown in Fig. 2.17 this comes as an overtraining effect - too long training could lead to a poor generalization effect. In other words, the network tends to memorize rather than generalize. Here to enhance performance, the training should have been terminated at an earlier stage as suggested by the second curve when still showing a decline of prediction error on the testing set. Simply, it follows the data too closely even though some of the elements of the training set could be excessively noisy. The error can be made as small as possible by increasing the size of the network - this, however, implies poor generalization abilities.
A more detailed statistical analysis sheds light on the very nature of this problem. Consider that the input (x) is governed by a certain probability density function (p.d.f.) p(x), so it is meaningful to talk about an expected squared error between the original function f(x) the neural network NN(x) has to approximate. This error is just equal to Let us rewrite this as a sum of two terms, namely2
The first component is known as a bias while the second one is usually referred to as a variance. The left hand side of the expression of obviously fixed. Thus any increase of bias reduces the variance and vice-versa. Here we have to come up to grip as far as this balance has to be achieved:
Definitely, this bias-variance guideline is well - known in practice, yet if we do not know p(x) (and this is usually the case in any application), the overall expression serves as an important qualitative hint. The solution to the overfitting problem is to extend the original performance index by adding a so-called regularization term (Poggio and Girosi, 1990). Thus This additional term ||P|| captures some extra requirements about smoothness of the function to be approximated and is expressed over the connections of the network. λ stands for a scaling factor that helps achieve a reasonable compromise between accuracy of the mapping and the produced regularization effect.
Copyright © CRC Press LLC
![]() |
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
![]() |