![]() |
|
|||
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
|
![]() |
[an error occurred while processing this directive]
The derivative of the performance index Q taken with respect to the given weight wji comes in the form We use the notation Then, obviously, see Fig. 2.12 Thus that assumes the form similar to that exploited in the delta learning rule. The evident difference in the realization of the learning scheme depends whether zj is situated in the output or one of the hidden layers. If zj concerns the output layer, this signal can be compared with the target value, targetj. Then we obtain so the overall expression is nothing but the delta rule. Now, if zj is not directly confronted to the target value, we employ the standard chaining rule of differential calculus. Let us start with δj, where the above summation is taken over all the units (k) placed at the layer close to the output of the network. Calculate now Plugging this into the previous formula, we get The BP proceeds in two passes: the input x propagates through the network up starting from the input layer; the error signal necessary to update the weights propagates back from the output layer down to the input one. 2.5.5. Hebbian learningThe Hebbian learning results from the classic synaptic modification (Hebb, 1949) stating that a connection changes in proportion to strength between pre - and postsynaptic signals. This form of learning is a well-known example of unsupervised learning. For a single linear unit, y = xTw, the update rule articulating this modification scheme (Stent, 1973; see also Kosko, 1992) reads as where x(k) and y(k) form an input - output pair of data; the learning rate is denoted by α. To understand what the Hebbian learning really means, let us insert the expression of the neuron into the learning rule. The increment Δw becomes equal Taking the expected value of the weight increment E(Δ w) one has Assuming that x and w are independent, the increment depends upon the autocorrelation matrix of the input data, C = E(xxT). 2.5.6. Competitive learningThe learning mechanism of competition is easily explained in the simple neural architecture formed by a single layer of linear units. For any input x we make a single unit active at a given time. This neuron is called the winner. The winning node is the one that produces the highest output to the given input x, namely where wi is the vector of the connections of the i-th unit. The winning rule is straightforward (winner-takes-all competition):
The update formula driving the changes of the connections of the winning node (i) is expressed as (Rumelhart and Zipser, 1985) The connections of the remaining neurons are left intact. The network implementation of the competitive learning is anticipated by incorporating inhibitory and excitatory connections between the neurons, Fig. 2.13. Each neuron excites itself (connection set to 1) while it suppresses the others with a strength e (usually e is chose from the range 0 to 1/m with m being the number of the competing units).
2.5.7. Self-organizing mapsThe mechanism of competitive learning can be realized through a so-called self-organizing feature map (Kohonen, 1984). This network is usually built as a two-dimensional grid of linear units (neurons), Fig. 2.14, where each of them receives the same input x situated in Rn.
When the input x is provided, the neurons start to compete. Let the winning node will be the one with the coordinates i*j*. It occurs for the unit for which the distance between the input and the respective vector of the connections attains a minimal value over the entire grid of the neurons, While in the previous model of competitive learning, only the winning node updates its connections. In this approach we allow a certain neighborhood of the winning node, say (i*j*) , to become affected and modify their weights, yet to a lesser degree as the winner itself. This interaction between the neurons occurring at the learning level is conveyed by the so-called neighbor function Φ(ij, i*j*). The updates of the connections are governed by the expression with ij being the coordinates of the ij-th unit in the grid. The neighborhood function Φ(ij, i*j*) preserves topological properties of the map in such a way that
What this really means is that Δwij at this node is made lower up to the point where the increment attains zero expressing that the node is too far from the winner to become affected during the learning process. Additionally, the local field formed by the neighborhood function Φ is also changed over the training time - it usually shrinks (eventually in a linear fashion) once the learning progresses over time. 2.5.8. Learning in presence directly and indirectly labeled patternsHere we discuss a problem of a hybrid learning when we are provide with some fully labeled and partially labeled data. The data that are fully labeled are denoted by
Copyright © CRC Press LLC
![]() |
![]() |
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
![]() |
![]() |