**This In-depth Tutorial on Neural Network Learning Rules Explains Hebbian Learning and Perceptron Learning Algorithm with Examples:**

In our previous tutorial we discussed about **Artificial Neural Network** which is an architecture of a large number of interconnected elements called neurons.

These neurons process the input received to give the desired output. The nodes or neurons are linked by inputs, connection weights, and activation functions.

The main characteristic of a neural network is its ability to learn. The neural networks train themselves with known examples. Once the network gets trained, it can be used for solving the unknown values of the problem.

**=> Read Through The Complete Machine Learning Training Series**

The Neural Network learns through various learning schemes that are categorized as supervised or unsupervised learning.

In supervised learning algorithms, the target values are known to the network. It tries to reduce the error between the desired output (target) and the actual output for optimal performance. In unsupervised learning algorithms, the target values are unknown and the network learns by itself by identifying the hidden patterns in the input by forming clusters, etc.

An ANN consists of 3 parts i.e. input, hidden layer, and output layer. There is a single input layer and output layer while there may be no hidden layer or 1 or more hidden layers that may be present in the network. Based on this structure the ANN is classified into a single layer, multilayer, feed-forward, or recurrent networks.

## Important ANN Terminology

Before we classify the various learning rules in ANN, let us understand some important terminologies related to ANN.

**#1) Weights:** In an ANN, each neuron is connected to the other neurons through connection links. These links carry a weight. The weight has information about the input signal to the neuron. The weights and input signal are used to get an output. The weights can be denoted in a matrix form that is also called a Connection matrix.

Each neuron is connected to every other neuron of the next layer through connection weights. Hence, if there are “n” nodes and each node has “m” weights, then the weight matrix will be:

W1 represents the weight vector starting from node 1. W11 represents the weight vector from the 1^{st} node of the preceding layer to the 1^{st} node of the next layer. Similarly, wij represents the weight vector from the “ith” processing element (neuron) to the “jth” processing element of the next layer.

**#2) Bias**: The bias is added to the network by adding an input element x (b) = 1 into the input vector. The bias also carries a weight denoted by w (b).

The bias plays an important role in calculating the output of the neuron. The bias can either be positive or negative. A positive bias increases the net input weight while the negative bias reduces the net input.

**#3) Threshold:** A threshold value is used in the activation function. The net input is compared with the threshold to get the output. In NN, the activation function is defined based on the threshold value and output is calculated.

**The threshold value is:**

**#4) Learning Rate**: It is denoted by alpha ?. The learning rate ranges from 0 to 1. It is used for weight adjustment during the learning process of NN.

**#5) Momentum Factor**: It is added for faster convergence of results. The momentum factor is added to the weight and is generally used in backpropagation networks.

## Comparison Of Neural Network Learning Rules

Learning Methods -> | Gradient Descent | Hebbian | Competitive | Stochastic |
---|---|---|---|---|

Type of Architecture || | ||||

Single Layer Feedforward | ADALINE Hopfield Perceptron | Associative Memory Hopfield | Linear Vector Quantization | |

Multilayer Feed Forward | Cascade Correlation Multilayer Feed Forward Radial Bias Function | Neocognitron | ||

Recurrent | Recurrent Neural Network | Bidirectional Auto Associative Memory Brain- State- In-a- Box Hopfield | Adaptive Resonance Theory | Boltzmann Machine Cauchy Machine |

The classification of various learning types of ANN is shown below.

### Classification Of Supervised Learning Algorithms

- Gradient Descent
- Stochastic

#### #1) Gradient Descent Learning

In this type of learning, the error reduction takes place with the help of weights and the activation function of the network. The activation function should be differentiable.

The adjustment of weights depends on the error gradient E in this learning. The backpropagation rule is an example of this type of learning. Thus the weight adjustment is defined as

#### #2) Stochastic Learning

In this learning, the weights are adjusted in a probabilistic fashion.

### Classification Of Unsupervised Learning Algorithms

- Hebbian
- Competitive

#### #1) Hebbian Learning

This learning was proposed by Hebb in 1949. It is based on correlative adjustment of weights. The input and output patterns pairs are associated with a weight matrix, W.

The transpose of the output is taken for weight adjustment.

#### #2) Competitive Learning

It is a winner takes all strategy. In this type of learning, when an input pattern is sent to the network, all the neurons in the layer compete and only the winning neurons have weight adjustments.

### Mc Culloch-Pitts Neuron

Also known as M-P Neuron, this is the earliest neural network that was discovered in 1943. In this model, the neurons are connected by connection weights, and the activation function is used in binary. The threshold is used to determine whether the neuron will fire or not.

**The function of the M-P neuron is:**

### Hebbian Learning Algorithm

**Hebb Network** was stated by Donald Hebb in 1949. According to Hebb’s rule, the weights are found to increase proportionately to the product of input and output. It means that in a Hebb network if two neurons are interconnected then the weights associated with these neurons can be increased by changes in the synaptic gap.

This network is suitable for bipolar data. The Hebbian learning rule is generally applied to logic gates.

The weights are updated as:

**W (new) = w (old) + x*y**

**Training Algorithm For Hebbian Learning Rule**

The training steps of the algorithm are as follows:

- Initially, the weights are set to zero, i.e. w =0 for all inputs i =1 to n and n is the total number of input neurons.
- Let s be the output. The activation function for inputs is generally set as an identity function.
- The activation function for output is also set to y= t.
- The weight adjustments and bias are adjusted to:

- The steps 2 to 4 are repeated for each input vector and output.

**Example Of Hebbian Learning Rule**

Let us implement logical AND function with bipolar inputs using Hebbian Learning

X1 and X2 are inputs, b is the bias taken as 1, the target value is the output of logical AND operation over inputs.

Input | Input | Bias | Target |
---|---|---|---|

X1 | X2 | b | y |

1 | 1 | 1 | 1 |

1 | -1 | 1 | -1 |

-1 | 1 | 1 | -1 |

-1 | -1 | 1 | -1 |

**#1)** Initially, the weights are set to zero and bias is also set as zero.

W1=w2=b=0

**#2)** First input vector is taken as [x1 x2 b] = [1 1 1] and target value is 1.

**The new weights will be:**

**#3)** The above weights are the final new weights. When the second input is passed, these become the initial weights.

**#4)** Take the second input = [1 -1 1]. The target is -1.

**#5)** Similarly, the other inputs and weights are calculated.

**The table below shows all the input:**

Inputs | Bias | Target Output | Weight Changes | Bias Changes | New Weights | ||||
---|---|---|---|---|---|---|---|---|---|

X1 | X2 | b | y | ∆w1 | ∆w2 | ∆b | W1 | W2 | b |

1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |

1 | -1 | 1 | -1 | -1 | 1 | -1 | 0 | 2 | 0 |

-1 | 1 | 1 | -1 | 1 | -1 | -1 | 1 | 1 | -1 |

-1 | -1 | 1 | -1 | 1 | 1 | -1 | 2 | 2 | -2 |

**Hebb Net for AND Function**

### Perceptron Learning Algorithm

Perceptron Networks are single-layer feed-forward networks. These are also called Single Perceptron Networks. The Perceptron consists of an input layer, a hidden layer, and output layer.

The input layer is connected to the hidden layer through weights which may be inhibitory or excitery or zero (-1, +1 or 0). The activation function used is a binary step function for the input layer and the hidden layer.

**The output is**

Y= f (y)

**The activation function is:**

The weight updation takes place between the hidden layer and the output layer to match the target output. The error is calculated based on the actual output and the desired output.

If the output matches the target then no weight updation takes place. The weights are initially set to 0 or 1 and adjusted successively till an optimal solution is found.

The weights in the network can be set to any values initially. The Perceptron learning will converge to weight vector that gives correct output for all input training pattern and this learning happens in a finite number of steps.

The Perceptron rule can be used for both binary and bipolar inputs.

**Learning Rule for Single Output Perceptron**

**#1)** Let there be “n” training input vectors and x (n) and t (n) are associated with the target values.

**#2)** Initialize the weights and bias. Set them to zero for easy calculation.

**#3)** Let the learning rate be 1.

**#4)** The input layer has identity activation function so x (i)= s ( i).

**#5)** To calculate the output of the network:

**#6)** The activation function is applied over the net input to obtain an output.

**#7)** Now based on the output, compare the desired target value (t) and the actual output.

**#8)** Continue the iteration until there is no weight change. Stop once this condition is achieved.

**Learning Rule for Multiple Output Perceptron**

**#1)** Let there be “n” training input vectors and x (n) and t (n) are associated with the target values.

**#2)** Initialize the weights and bias. Set them to zero for easy calculation.

**#3)** Let the learning rate be 1.

**#4)** The input layer has identity activation function so x (i)= s ( i).

**#5)** To calculate the output of each output vector from j= 1 to m, the net input is:

**#6)** The activation function is applied over the net input to obtain an output.

**#7)** Now based on the output, compare the desired target value (t) and the actual output and make weight adjustments.

w is the weight vector of the connection links between ith input and jth output neuron and t is the target output for the output unit j.

**#8)** Continue the iteration until there is no weight change. Stop once this condition is achieved.

#### Example Of Perceptron Learning Rule

Implementation of AND function using a Perceptron network for bipolar inputs and output.

The input pattern will be x1, x2 and bias b. Let the initial weights be 0 and bias be 0. The threshold is set to zero and the learning rate is 1.

**AND Gate**

X1 | X2 | Target |
---|---|---|

1 | 1 | 1 |

1 | -1 | -1 |

-1 | 1 | -1 |

-1 | -1 | -1 |

**#1)** X1=1 , X2= 1 and target output = 1

W1=w2=wb=0 and x1=x2=b=1, t=1

Net input= y =b + x1*w1+x2*w2 = 0+1*0 +1*0 =0

**As threshold is zero therefore:**

From here we get, output = 0. Now check if output (y) = target (t).

y = 0 but t= 1 which means that these are not same, hence weight updation takes place.

The new weights are 1, 1, and 1 after the first input vector is presented.

**#2)** X1= 1 X2= -1 , b= 1 and target = -1, W1=1 ,W2=2, Wb=1

Net input= y =b + x1*w1+x2*w2 = 1+1*1 + (-1)*1 =1

The net output for input= 1 will be 1 from:

Therefore again, target = -1 does not match with the actual output =1. Weight updates take place.

Now new weights are w1 = 0 w2 =2 and wb =0

Similarly, by continuing with the next set of inputs, we get the following table:

Input | Bias | Target | Net Input | Calculated Output | Weight Changes | New Weights | |||||
---|---|---|---|---|---|---|---|---|---|---|---|

X1 | X2 | b | t | yin | Y | ∆w1 | ∆w2 | ∆b | W1 | W2 | wb |

EPOCH 1 | |||||||||||

1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 |

1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | -1 | 0 | 2 | 0 |

-1 | 1 | 1 | -1 | 2 | 1 | 1 | -1 | -1 | 1 | 1 | -1 |

-1 | -1 | 1 | -1 | -3 | -1 | 0 | 0 | 0 | 1 | 1 | -1 |

EPOCH 2 | |||||||||||

1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | -1 |

1 | -1 | 1 | -1 | -1 | -1 | 0 | 0 | 0 | 1 | 1 | -1 |

-1 | 1 | 1 | -1 | -1 | -1 | 0 | 0 | 0 | 1 | 1 | -1 |

-1 | -1 | 1 | -1 | -3 | -1 | 0 | 0 | 0 | 1 | 1 | -1 |

The EPOCHS are the cycle of input patterns fed to the system until there is no weight change required and the iteration stops.

### Widrow Hoff Learning Algorithm

Also known as **Delta Rule**, it follows gradient descent rule for linear regression.

It updates the connection weights with the difference between the target and the output value. It is the least mean square learning algorithm falling under the category of the supervised learning algorithm.

This rule is followed by ADALINE (Adaptive Linear Neural Networks) and MADALINE. Unlike Perceptron, the iterations of Adaline networks do not stop, but it converges by reducing the least mean square error. MADALINE is a network of more than one ADALINE.

The motive of the delta learning rule is to minimize the error between the output and the target vector.

The weights in ADALINE networks are updated by:

Least mean square error = (t- y_{in})^{2}, ADALINE converges when the least mean square error is reached.

## Conclusion

In this tutorial, we have discussed the two algorithms i.e. Hebbian Learning Rule and Perceptron Learning Rule. The Hebbian rule is based on the rule that the weight vector increases proportionally to the input and learning signal i.e. the output. The weights are incremented by adding the product of the input and output to the old weight.

W (new) = w (old) +x*y

The application of Hebb rules lies in pattern association, classification and categorization problems.

The Perceptron learning rule can be applied to both single output and multiple output classes’ network. The goal of the perceptron network is to classify the input pattern into a particular member class. The input neurons and the output neuron are connected through links having weights.

The weights are adjusted to match the actual output with the target value. The learning rate is set from 0 to 1 and it determines the scalability of weights.

**The weights are updated according to:**

Apart from these learning rules, machine learning algorithms learn through many other methods i.e. Supervised, Unsupervised, Reinforcement. Some of the other common ML algorithms are Back Propagation, ART, Kohonen Self Organizing Maps, etc.

**We hope you enjoyed all the tutorials from this Machine Learning Series!!**

**=> Visit Here For The Exclusive Machine Learning Series**