Artificial Neural Network Models – Multilayer Perceptron & Others

By Vijay

By Vijay

I'm Vijay, and I've been working on this blog for the past 20+ years! I’ve been in the IT industry for more than 20 years now. I completed my graduation in B.E. Computer Science from a reputed Pune university and then started my career in…

Learn about our editorial policies.
Updated March 7, 2024

This Tutorial Explains Artificial Neural Network Models – Multilayer Perceptron, Backpropagation, Radial Bias & Kohonen Self Organising Maps including their Architecture:

In the Previous tutorial about Neural Network Learning Rules, we learned the Hebbian Learning and Perceptron Learning Algorithm with examples.

In supervised learning, the desired output which is often called the target value of the network is known to the neural network. It optimizes its performance to reduce the error between the actual output and target.

On the other hand, an unsupervised type of learning does not have any information about the target value. It tries to optimize its performance on its own by identifying the hidden pattern trends in inputs by forming clusters.

=> Read Through The Complete Machine Learning Training Series

Machine Learning and Artificial Neural Network Models

Gradient Descent and Stochastic learning algorithms fall in the category of supervised learning algorithms. Hebbian and Competitive learning algorithms fall into the category of unsupervised learning algorithms. We have studied these in our previous tutorial.

Machine Learning And Artificial Neural Network Models

Let’s take a quick look at the structure of the Artificial Neural Network.

ANN has 3 layers i.e. Input layer, Hidden layer, and Output layer. Each ANN has a single input and output but may also have none, one or many hidden layers. The structure of ANN classifies into many types of architecture such as a Single layer, Multi-layer, Feed-forward, and Recurrent networks.

There are weights associated with each input neuron in Artificial Neural Network, bias which also carries weight. An activation function is applied over the net input to calculate the output. The output is then compared to the target and weights are adjusted.

The activation functions are of many types such as Binary step function, Bipolar step, Sigmoidal function, etc.

The above terms are described in the diagram below:

Activation functions in ANN

In this tutorial, we will focus on the Artificial Neural Network Models – Multi Perceptron, Radial Bias and Kohonen Self Organising Maps in detail.

What Is A Multilayer Perceptron?

A Perceptron network with one or more hidden layers is called a Multilayer perceptron network. A multi perceptron network is also a feed-forward network. It consists of a single input layer, one or more hidden layers and a single output layer.

Due to the added layers, MLP networks extend the limitation of limited information processing of simple Perceptron Networks and are highly flexible in approximation ability. The MLP networks are trained and the weights are updated using the backpropagation learning method which is explained below in detail.

Some limitations of a simple Perceptron network like an XOR problem that could not be solved using Single Layer Perceptron can be done with MLP networks.

Backpropagation Networks

A Backpropagation (BP) Network is an application of a feed-forward multilayer perceptron network with each layer having differentiable activation functions.

For a given training set, the weights of the layer in a Backpropagation network are adjusted by the activation functions to classify the input patterns. The weight update in BPN takes place in the same way in which the gradient descent method is applied to the single perceptron networks.

Minimization Of Error Using BP Algorithm

In this algorithm, the error between the actual output and target is propagated back to the hidden unit. For minimizing the error, the weights are updated. To update the weights the error is calculated at the output layer.

For further minimization of error and to calculate the error at the hidden layer, some advanced techniques that will help in calculation and reduction of error at the hidden layer leading to more accurate output are applied.

With a greater number of hidden layers, the network becomes more complex and slower, but it is more beneficial. The system can be trained with one hidden layer as well. Once trained it will start producing the output rapidly.

This learning algorithm is called backpropagation learning and the network is called a Backpropagation network.

Backpropagation Learning is done in 3 stages:

  1. The input training pattern is feed-forward.
  2. The error between actual output and target values are calculated.
  3. The weights update.

Backpropagation Networks

Architecture Of BP Networks

Let’s see the architecture of Backpropagation networks.

A backpropagation network is a feed-forward multilayer network. It has an input layer, a hidden layer, and an output layer. The biases are added to the network at the hidden layer and the output layer with activation function=1. The inputs and outputs to the BPN can either be binary (0,1) or bipolar (-1,+1).

The activation function is differentiable, monotonic & incremental and is generally chosen between binary sigmoidal or bipolar sigmoidal.

A backpropagation network has a feed-forward phase where the data is fed from the input towards the output and a back-propagation phase where the signals are sent back in a reverse direction to minimize the error.

BP Algorithm

Training Process Of Back Propagation Algorithm

From the above image,

Training Process of Back Propagation Algorithm

Calculation

Step1: Initialize random weights and learning rate.

Step2: The input unit receives xi as input and sends it to the hidden unit.

Step 3: The net input of the hidden layer unit zj is calculated as net input of hidden layer unit zj

Step 4: Net Output of the hidden layer is calculated as zj= f (zinput), the activation function is taken as binary or bipolar sigmoidal.

Step 5: The net input of the output layer is calculated as zj= f (zinput).

Step 6: The net output of the output layer: f(yinput), the activation function is taken as binary or bipolar sigmoidal.

Step 7: Calculation of Calculation of error  where output unit yk(k=1 to m) receives the target pattern corresponding to the input training pattern.

Find out the derivative of the function.

Step 8: Error correction and Weight Updation.

Weight Updation:

weight updation

The error is sent backward.

Step 9: The output units are updated: (yk, k=1 to m) updates the bias and weights:

output units are updated

Step 10: Check for the stopping condition that is given as the number of epochs completed.

The steps 2 to 9 are repeated until the stopping condition is obtained.

Factors Affecting The Back-Propagation Network

Some of the factors that affect the training of Backpropagation networks are:

  1. Initial Weights: The initial random weights chosen are of very small value as the larger inputs in binary sigmoidal functions may lead to saturation at the very beginning, thereby leading the function been stuck at local minima. Some ways of initialization of weights can be using Nguyen-Widow’s initialization. It analyzes the response of hidden neurons to a single input, by improving the learning ability of hidden units. This leads to faster convergence of BPN.
  2. Learning rate: A large value of learning rate,alpha helps in faster convergence but might lead to overshooting. The range of alpha from 10-3 to 10 is used for various BPN experiments.
  3. Number of Training Data: The input training data should cover the entire input space and the set of input sets should be chosen randomly.
  4. Number of Hidden Layer Nodes: The number of hidden layer nodes is chosen for optimum performance of the network. For networks that do not converge to a solution, more hidden nodes can be chosen while for networks with fast convergence few hidden layer nodes are selected.

Example of a Back-propagation Network

For the following network diagram, let’s calculate the new weights with the given figures:

Example of BP network

Input vector = [0,1]
Target output = 1
Learning Rate = 0.25
Activation function= binary sigmoidal activation function

Solution:

From the above diagram we can see the input vector to Z1: [v11, v21, v01] is [ 0.6, -0.1, 0.3]
Input vector to Z2: [v12, v22, v02] = [-0.3, 0.3, 0.5]
Input vector to Y: [w1, w2, w0] = [ 0.4, 0.1, -0.2]
The activation function is given by f(x)= 1/ (1 +e -x)
Input x= [0,1] and target t=1

Step 1: Calculate the net input weight for Z1

Zin1 =v01 + x1 * v11 + x2 * v21

  • 0.3 + 0* 0.6 +1 *0.1
  • 0.2

Zin2= v02 + x1 * v21 + x2 * v22

  • 0.5+ 0*0.6 +1*(0.4)
  • 0.9

Step 2: Apply the Activation Function

zi = f(Zin1) = 1/1+e-zin1

  • 1/1+e-0.2
  • 0.5498

zj= f(Zin2) = 1/1+e-zin2

  • 1/1+e-0.9
  • 0.7109

Step 3: Calculate the Net input of Output layer

yin= w0 + zi*w1 + zj *w2

= -0.2+ 0.5498 * 0.4 + 0.7109 *0.1
= 0.09101

Step 4: Calculate the Net output using activation

y= f(yin) = 1/1+e-yin

  • 1/1+e-0.09101
  • 0.5227

Step 5: Calculation of Error

Step5 Calculation of error

Step 6: Weight Updation

Step 6 weight updation

Step 7: New Weights Calculation

New weights calculation

Thus, the final weights are calculated as W1(new)= 0.4164, W2 (new) =0.12117

*Assumption: The error between the input and hidden layer vectors is taken as 0.

Radial Bias Function

Radial Bias Function was developed by M.J.D Powell. It is a classification and approximation algorithm. Gaussian Functions are non-linear functions that are used in Radial Bias Networks. Gaussian Function is used in the regularization of networks.

It is defined as:

f(y)= e-y^2, the f(y) is always positive for all values of y, f(y) decreases of 0 as |y| approaches 0.

The derivative of f(y) = -2 *y * f(y)

Radial Bias Function

The name radial bias is taken from the concept that this function gives the same output for inputs that are at fixed radial distances from the center of the kernel. These inputs are radially symmetric and thus the name radial bias function network is taken.

Architecture of Radial Bias Function

The architecture of the Radial Bias function is given below.

Radial Bias Function architecture

The radial bias function network consists of input, hidden and output layers.

The hidden layer nodes are the radial bias function (RBF) nodes. The hidden layer has a non-linear basis function that produces a response to the input stimulus. The input should be under the localized region of the input space. Thus, this network is also called a Localized receptive field network.

Training Of Radial Bias Function

Step 1: Set the weights to some random initial values.

Step 2: Each input node receives the input signals.

The input unit: xi for all I = 1 to n

Step 3: Calculate the radial bias function using the gaussian function.

Step 4: Select an adequate number of centers from the input vectors.

Step 5: The output from the hidden unit is calculated as

Output from the hidden unit

Where x^ji is the center of the radial bias function unit for input vectors, pi is the width of the ith RBF unity and xji is the jth variable of the input vector pattern.

Step 6: The output is calculated as:

output

Where,

k is the number of hidden layer nodes.
ynet is the output value of the mth node in the output layer for the nth incoming pattern.
wo is the biasing term at the nth output node.

Step 7: Calculate the error and check for the stopping conditions such as the number of epochs, etc.

Kohonen Self Organising Feature Maps

Feature Maps is a method in which multi-dimensional inputs are converted into one or two-dimensional array i.e. it converts a vast array space into a feature space while maintaining the properties of the input features.

To obtain the feature maps, is it necessary to recognize a one or two-dimensional array. These one dimensional or two-dimensional neural arrays are called Self-organizing neural arrays. It is an unsupervised learning network.

For Example, there is an output cluster of m units arranged in a 1D or 2D array and the input signal of n units. The given output pattern is taken as a reference for the input pattern. Thus, when self-organization is done, the input vector unit which matches closely with the weight vector cluster unit is chosen as the winner.

Kohenen Self Organising Feature Maps

To find the closest input unit, the weight vector is calculated using the Euclidean distance formula.

So, for the units having minimum square (Euclidean distance), the input unit is chosen as the winner. Another way to find the winning input neuron is by using the dot product. The unit with a maximum dot product is chosen as the winner.

A Rectangular Grid of clusters is shown above. The N(k1), N(k2), N(k3) are radii where k1>k2>k3.The winning unit is denoted by “#” and the other output units are denoted by “o”. Each unit has eight nearest neighbors.

Architecture Of Kohonen Self Organising Feature Maps

The architecture of Kohonen Self Organising Maps is shown below:

Kohenen Self Organising Maps

There are 2 layers i.e. the input and the output layer. The input layer consists of n units and the output layer consists of m units.

The weight Updation takes place on the winning neuron unit which is calculated using Euclidean Distance or Dot Product method. The network is trained until the number of epochs is found or when the learning rate reduces to a very small value.

Training Of Feature Maps

Step 1: Initialize random weights wij and learning rate alpha. It can be chosen as a sample range of input values.

Step 2: Calculate the square of the Euclidean Distance for each input vector x.

Calculate the square of the Euclidean Distance

Step 3: The winning unit will be the one with the minimum value of D(j).

Step 4: Weight Updation and calculation of new weights.

Weight Uapdation and calculation of new weights

Step 5: Update the Learning rate Update the Learning rate

Step 6: Reduce the radius of the topological neighborhood at specific intervals.

Step 7: Repeat steps 2-6 until the stopping condition is received.

Example of Kohonen Self Organising Maps

For given input vectors, construct a Kohonen Self Organising Maps

There are four given vectors: [0 0 1 1], [1 0 0 0], [0 1 1 0], [ 0 0 0 1].
Form 2 clusters.
Initial Learning Rate: 0.5

construct a Kohenen Self Organising Maps

Step 1: Initialise the weights between 0 and 1.

Initialise the weights between 0 and 1

Step 2: Calculate the Euclidean Distance:

Calculate the Euclidean Distance

Since D(1)<D(2) therefore D(1) is minimum. Thus, the winning cluster unit is Y1.

Step 3: Updating the weights on the winning cluster unit.

Updating the weights on the winning cluster unit

The updated weight matrix
Wij= [0.1 0.9; 0.2 0.7; 0.8 0.5; 0.9 0.3]

Similarly, calculate the new weight matrix for the other three inputs.

For 2nd input:
Wij=[0.1 0.95;0.2 0.35; 0.8 0.25; 0.9 0.15]

For 3rd input:
Wij= [0.05 0.95; 0.6 0.35;0.9 0.25; 0.45 0.15]

For 4th input:
Wij= [0.025 0.95; 0.3 0.35; 0.45 0.25; 0.475 0.15]

1st iteration or epoch is complete.

Step 4: Updating the learning rate.

Updating the learning rate

Updated Weight Diagram

Updated weight diagram

More iterations can be performed until the learning rate reduces to a very small value or till the radius becomes zero.

Conclusion

Multi-layer perceptron networks are the networks with one or more hidden layers. The backpropagation network is a type of MLP that has 2 phases i.e. Feed Forward Phase and Reverse Phase.

In the Feedforward phase, the input neuron pattern is fed to the network and the output gets calculated when the input signals pass through the hidden input and output layer.

In Reverse Phase, the error is backpropagated to the hidden and input layer for weights adjustment. The error is calculated at the output layer when the actual output is compared with the target value.

Some networks also calculate the error at the hidden layer which is propagated back to the input layer. This helps in more accuracy and convergence. BPNs are supervised multilayer perceptron networks.

Radial Bias function uses Gaussian or Sigmoidal functions to regularise the networks. For many input nodes, each node produces a similar output within a fixed radial distance from the center of the kernel.

Kohonen Self Organising Maps are unsupervised learning algorithms that convert a multidimensional input space vector into a one dimensional or two-dimensional space vector.

=> Visit Here For The Exclusive Machine Learning Series

Was this helpful?

Thanks for your feedback!

Leave a Comment