EarthWeb   
HomeAccount InfoLoginSearchMy ITKnowledgeFAQSitemapContact Us
     

   
  All ITKnowledge
  Source Code

  Search Tips
  Advanced Search
   
  

  

[an error occurred while processing this directive]
Previous Table of Contents Next


It is also worth underlining that the approximation theorem emphasizes the layered nature of the network and identifies a need for the utilization of some hidden layers. Interestingly enough, the concept of layered architectures can be envisioned in some other problems distinct from the approximation problem. For instance, the fundamental representation theorem encountered in two-valued (Boolean) functions states that any Boolean function from {0, 1}n to {0, 1}m is represented as a sum of minterms (products). This implies a neural architecture with a single hidden layer of AND units followed by the layer of the OR units.

2.4.2. Generic modes of learning in neural networks

Neural networks are inherently plastic structures that call for substantial learning. In fact, the learning occurs under different learning conditions and can be completed under various types of interaction with the environment with which they interact. Generally, as usually emphasized in the literature, there are three main types of learning, Fig. 2.10, namely

  supervised learning
  reinforcement learning
  unsupervised learning

These three modes of learning are listed in the increasing level of challenge they pose on the neural network.


Figure 2.10  Modes of learning in neural networks (i) supervised, (ii) reinforcement learning, (iii) unsupervised learning

supervised learning In supervised learning we are provided with a set of input-output data. The neural network has to reproduce (approximate) these pairs to the highest possible extent. In other words, we require that the following condition should be satisfied

or any k=1, 2, …, N where (x(k), target(k)) are the corresponding input-output learning pairs in the training set. Equivalently, which is far more realistic, we request that the sum of distances

should be minimized. The minimization is achieved by changing the structure or/and connections of the network (in fact, most of the learning activities are geared towards the parametric learning).

reinforcement learning Reinforcement learning shows an interesting behavior and assumes a number of different scenarios. Overall, the reinforcement means that the network is provided with a global signal about the network’s performance. This signal acts as a penalty (or reward) mechanism. The higher its value, the more erroneous the behavior of the network. The reinforcement can also be a function of a global target. The available reinforcement can be spatial or temporal or both. The spatial reinforcement occurs when we are provided not with the individual target values but with a form of its scalar aggregate r, Fig. 2.10 (ii). Temporal reinforcement learning occurs when the network is updated on its performance after a certain period of time. A relevant example is when the time series produced by the network is evaluated over a certain time slice, ΔT, rather than being monitored continuously.

unsupervised learning Finally, the mode of unsupervised learning has no provisions made for any type of supervision and the network has to reveal and capture essential dependences occurring in the data set on its own. Quite commonly, unsupervised learning is referred to as a clustering and covers a significant number of the pertinent algorithms including hierarchical clustering, C-Means, Fuzzy C - Means, etc.

2.4.3. Performance indexes in training of neural networks

Overall, neural networks can be optimized with the aid of a diversity of performance indexes. The choice of a specific index should be directly linked with the problem at hand. Here we elaborate on the class of distance functions as they are the most dominant in the current applications of neural networks. A general class of distances known as the Minkowski distance reads as

where r > 0 and a and b are the two vectors situated in Rn. Depending upon the value of the parameter (r) standing in the distance function, we distinguish several important cases:

  if r=2 we obtain the Euclidean distance

This distance is often favored over the other options because of two main reasons. Firstly, it carries some meaningful physical interpretation as an energy of the error signal between a and b. Secondly, this distance is differentiable that helps a lot in any gradient-based optimization.

  if r=1 we end up with a so-called city-block (Hamming) distance

that, as we discuss later on, exhibits an interesting robustness property.

  another option of the Minkowski distance with r= ∞ yields a Tschebyschev distance

Note that this type of distance takes into consideration the maximal absolute difference between the coordinates of a and b. All the remaining coordinates are completely ignored.

2.5. Selected classes of learning methods

Having discussed the main modes of learning, we now get into more details by studying a number of selected learning algorithms. The discussion embraces both supervised and unsupervised learning methods. Before proceeding with a detailed discussion of the individual methods, it is instructive to cast a number of these optimization problems in the setting of the multidimensional gradient-based optimization environment (as this is what the parametric learning of the neural networks is all about).

2.5.1. Gradient-based optimization of multivariable functions

Let Q: RnR be at least twice differentiable function of its arguments. Confining ourselves to the two lowest derivatives, we can approximate Q(w) around the point w + Δ w as

where Δw is a vector of increments of w while a gradient vector and a matrix of second derivatives (Hessian matrix) embrace the following quantities

and

Now, if w is a stationary point of Q (and its minimum, in particular), then the above expression implies the relationship

In general, any optimization scheme can be portrayed as a search across the search space. Let us describe a movement in this space to be governed by the expression

Inserting the previously computed increment, we derive

that captures the very nature of the gradient-driven Newton’s optimization method (Polyak, 1987).


Previous Table of Contents Next

Copyright © CRC Press LLC

HomeAccount InfoSubscribeLoginSearchMy ITKnowledgeFAQSitemapContact Us
Products |  Contact Us |  About Us |  Privacy  |  Ad Info  |  Home

Use of this site is subject to certain Terms & Conditions, Copyright © 1996-2000 EarthWeb Inc. All rights reserved. Reproduction in whole or in part in any form or medium without express written permission of EarthWeb is prohibited. Read EarthWeb's privacy statement.