![]() |
|
|||
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
|
![]() |
[an error occurred while processing this directive]
2.7. Enhancements of gradient-based learning in neural networksIt has become obvious that the gradient-based method can exploit only the first-order information about the multidimensional surface of the performance index Q(w); the use of the second order information (conveyed by Hessian matrix) is essentially computationally infeasible. This limited search guidance, being profoundly manifested in the multilayer architectures of neural networks, usually results in a very slow convergence. To alleviate this drawback, several enhancements of the generic learning algorithms were proposed. In particular, one can be overwhelmed by an abundance of various enhancements of the vanilla-form of BP. All of them attempt to get some extra mileage and achieve to repair what the drop of the Hessian matrix did to the efficiency of the optimization method. The reader should not be too optimistic, though: all of these attempts are rather local in their nature and may not work well across a broad range of the learning scenarios. In what follows, we briefly discuss some of these enhancements. Modifiable learning rate Too low values of the learning rate slow down learning. Too high rates produce an oscillatory behavior. This simple observation leads to a number of useful heuristics. Starting from an initial learning rate α, its value is increased linearly where κ >1 if the learning is smooth (no oscillations of Q along with its steady decrease), or decreased exponentially (ρ < 0) if the oscillations in the performance index were present. Momentum term Momentum is another augmentation of the standard delta rule and adds to the original method as an extra term Δ(conn) describes a change in the connections, while β > 0 is a momentum term. In flat regions of high fluctuations, the momentum term prevents the weights from gaining momentum. For flat regions of Q the effective learning rate increases. To observe that, let us rewrite the momentum formula over p-step iteration (Hassoun, 1995), that is For flat regions we can assume that the gradient does not change too much and can be placed in front of the summation operation Thus an effective learning rate a has increased and is equal 2.8. Concluding remarksThis chapter serves as a condensed, yet highly comprehensive introduction to neural networks and neurocomputing. We have emphasized the role of neural networks as universal approximators. In fact, all applications dwell to some extent on this important finding. Neural networks are versatile computational structures loaded with parametric flexibility conveyed by their connections. These variable weights strongly support learning yet can hamper the efficiency of learning due to the excessively large search space within which the optimization of neutral networks needs to be completed. We have discussed a number of the main topologies and learning methods. What is also evident is the fact that the learning perceived in neural networks is primarily geared into their parametric optimization - any structural changes do call for a different methodology. Do neural networks constitute a new concept? The answer is partially affirmative and this comes at the level of the idea of distributed processing. On the other hand, one may identify several examples showing that neural networks borrow a number of concepts from some other areas. An interesting example of such associations has emerged from statistics - the list below compares the basic nomenclatures of neural networks and statistics. Amazingly, one can identify a series of useful similarities:
While the approximation capabilities of neural networks can promise a lot, the role of engineering of neural networks is to put these into work. How this potential can be exploited becomes a matter of choosing the right architecture, preprocessing the data and carry out efficient learning. As we discuss in the reminder of this book, neural networks operating alone cannot fully satisfy these design goals and need a symbiotic interaction with some other technologies, especially fuzzy sets and evolutionary computing. 2.9. Problems2.1. Consider the modified performance index where the vector conn symbolizes all the connections of the network and μ > 0. Analyze the role of the second component as supporting the regularization effect. 2.2. Discuss the role of the nonlinear functions in the basic neurons. Are these functions essential if the neurons are situated in the hidden layer(s)? What about the neurons forming the output layer? 2.3. Elaborate how the network in Fig. 2.18, where n >>m, can serve as a data compressor. What should be the target vector used in the training of this network?
2.4. The single variable function provided by its input-output pairs shown in Fig. 2.19 is to be approximated by a neural network. Not completing detailed learning, discuss whether this could be a difficult task. Why? Which segment of the data would be the most difficult to represent (approximate)?
2.5. The standard sigmoidal nonlinearity can be equipped with two auxiliary parameters How do they impact the nonlinear characteristics? Elaborate on the role of α and β on the efficiency of any learning procedure. Would you recommend their changes over the course of training? 2.6. Considering the RBF neural network, discuss the learning of its output layer composed of m linear units yi = wiTz. How can you introduce the effect of regularization into the training mechanism? 2.7.; Generalize the delta rule for a neural network with a single layer having n inputs and m outputs. Derive detailed learning formulas for the tangh type of nonlinearities. 2.8. Using the gradient-based learning, derive the perceptron learning rule. To accomplish that, it is convenient to consider the performance index of the form If the sum is taken over an empty set then Q(w) =0, otherwise it is positive. Moreover, the lower the values of Q, the better the performance of the network.
Copyright © CRC Press LLC
![]() |
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
![]() |