![]() |
|
|||
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
|
![]() |
[an error occurred while processing this directive]
It is also worth underlining that the approximation theorem emphasizes the layered nature of the network and identifies a need for the utilization of some hidden layers. Interestingly enough, the concept of layered architectures can be envisioned in some other problems distinct from the approximation problem. For instance, the fundamental representation theorem encountered in two-valued (Boolean) functions states that any Boolean function from {0, 1}n to {0, 1}m is represented as a sum of minterms (products). This implies a neural architecture with a single hidden layer of AND units followed by the layer of the OR units. 2.4.2. Generic modes of learning in neural networksNeural networks are inherently plastic structures that call for substantial learning. In fact, the learning occurs under different learning conditions and can be completed under various types of interaction with the environment with which they interact. Generally, as usually emphasized in the literature, there are three main types of learning, Fig. 2.10, namely
These three modes of learning are listed in the increasing level of challenge they pose on the neural network.
supervised learning In supervised learning we are provided with a set of input-output data. The neural network has to reproduce (approximate) these pairs to the highest possible extent. In other words, we require that the following condition should be satisfied or any k=1, 2, , N where (x(k), target(k)) are the corresponding input-output learning pairs in the training set. Equivalently, which is far more realistic, we request that the sum of distances should be minimized. The minimization is achieved by changing the structure or/and connections of the network (in fact, most of the learning activities are geared towards the parametric learning). reinforcement learning Reinforcement learning shows an interesting behavior and assumes a number of different scenarios. Overall, the reinforcement means that the network is provided with a global signal about the networks performance. This signal acts as a penalty (or reward) mechanism. The higher its value, the more erroneous the behavior of the network. The reinforcement can also be a function of a global target. The available reinforcement can be spatial or temporal or both. The spatial reinforcement occurs when we are provided not with the individual target values but with a form of its scalar aggregate r, Fig. 2.10 (ii). Temporal reinforcement learning occurs when the network is updated on its performance after a certain period of time. A relevant example is when the time series produced by the network is evaluated over a certain time slice, ΔT, rather than being monitored continuously. unsupervised learning Finally, the mode of unsupervised learning has no provisions made for any type of supervision and the network has to reveal and capture essential dependences occurring in the data set on its own. Quite commonly, unsupervised learning is referred to as a clustering and covers a significant number of the pertinent algorithms including hierarchical clustering, C-Means, Fuzzy C - Means, etc. 2.4.3. Performance indexes in training of neural networksOverall, neural networks can be optimized with the aid of a diversity of performance indexes. The choice of a specific index should be directly linked with the problem at hand. Here we elaborate on the class of distance functions as they are the most dominant in the current applications of neural networks. A general class of distances known as the Minkowski distance reads as where r > 0 and a and b are the two vectors situated in Rn. Depending upon the value of the parameter (r) standing in the distance function, we distinguish several important cases:
Note that this type of distance takes into consideration the maximal absolute difference between the coordinates of a and b. All the remaining coordinates are completely ignored. 2.5. Selected classes of learning methodsHaving discussed the main modes of learning, we now get into more details by studying a number of selected learning algorithms. The discussion embraces both supervised and unsupervised learning methods. Before proceeding with a detailed discussion of the individual methods, it is instructive to cast a number of these optimization problems in the setting of the multidimensional gradient-based optimization environment (as this is what the parametric learning of the neural networks is all about). 2.5.1. Gradient-based optimization of multivariable functionsLet Q: Rn → R be at least twice differentiable function of its arguments. Confining ourselves to the two lowest derivatives, we can approximate Q(w) around the point w + Δ w as where Δw is a vector of increments of w while a gradient vector and a matrix of second derivatives (Hessian matrix) embrace the following quantities and Now, if w is a stationary point of Q (and its minimum, in particular), then the above expression implies the relationship In general, any optimization scheme can be portrayed as a search across the search space. Let us describe a movement in this space to be governed by the expression Inserting the previously computed increment, we derive that captures the very nature of the gradient-driven Newtons optimization method (Polyak, 1987).
Copyright © CRC Press LLC
![]() |
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
![]() |