This Tutorial Explains Support Vector Machine in ML and Associated Concepts like Hyperplane, Support Vectors & Applications of SVM:
In the Previous tutorial, we learned about Genetic Algorithms and their role in Machine Learning.
We have studied some supervised and unsupervised algorithms in machine learning in our earlier tutorials. Backpropagation is a supervised learning algorithm while Kohenen is an unsupervised learning algorithm.
In this support vector machine tutorial, we will learn about support vector machines. SVMs are robust mathematical supervised machine learning algorithms extensively used in the classification of training data set.
What You Will Learn:
What Is A Support Vector Machine (SVM)
SVM algorithm is a supervised learning algorithm categorized under Classification techniques. It is a binary classification technique that uses the training dataset to predict an optimal hyperplane in an n-dimensional space.
This hyperplane is used to classify new sets of data. Being a binary classifier, the training data set the hyperplane divides the training data set into two classes.
The SVM algorithms are used to classify data in a 2-dimensional plane as well as a multidimensional hyperplane. The multidimensional hyperplane uses the “Kernels” to categorize the multidimensional data.
It is always desired to have a maximum distinction between the classified data points. This means that they should have maximum distance, or the hyperplane should have a maximum margin between the data points.
What Is A Hyperplane
A hyperplane is a boundary that divides the plane. It is a decision boundary that classifies the data points into 2 distinct classes. As SVM is used to classify data in multi-dimensional, a hyperplane can be a straight line if there are 2 inputs or a 2 D plane if there are more than 2 inputs.
The SVM algorithms can also be used for regression analysis but mainly it is utilized for classification. Let’s see some more details about each of these methods.
What Are Classification And Regression Algorithms
A classification algorithm is the one that analyzes the training data to predict the outcome. The outcome is a target class, for example, Day or Night, Yes or No, Long or Short. An example of a classification algorithm would be whether a customer in a superstore buying bread would also buy butter. The target class would be “Yes” or “No”.
A regression algorithm finds out the relationship between the independent variables and predicts an outcome. Linear regression finds out a linear relationship between the input and output. For example: “a” as input and “b” as output, a linear function would be b = k*a+ c.
What Are Support Vectors
Support Vectors are the data points that help us to optimize the hyperplane. These vectors lie closest to the hyperplane and are most difficult to classify. The position of the decision hyperplane depends on the support vectors. If these support vectors are removed, then it will also change the position of the hyperplane.
A Support Vector Machine (SVM) uses the input data points or features called support vectors to maximize the decision boundaries i.e. the space around the hyperplane. The inputs and outputs of an SVM are similar to the neural network. There is just one difference between the SVM and NN as stated below.
Inputs: The SVM network can contain n number of inputs say x1, x2, …….., xi, …., xn.
Outputs: The target output t.
Weights: Like neural network weights w1, w2, ……, wn are associated with each input whose linear combination predicts the output y.
Difference Between SVM And Neural Networks
The synaptic weights in Neural Networks are eliminated to calculate the output y of the network while in SVM the non-zero weights are reduced or eliminated to calculate the optimum decision boundary.
With the elimination of the weights, it will reduce the input data set to a few important input features that will help in deciding the separable hyperplane.
How Does A Support Vector Machine Work
As we know, the aim of the support vector machines is to maximize the margin between the classified data points. This will bring more optimal results to classify new sets of untrained data. Thus, it can be achieved by having a hyperplane at a position where the margin is maximum.
Let’s see an example of linearly separated data points:
Step 1: Find out the correct hyperplane from different possibilities: To decide the best hyperplane, find out all the possible planes that divide the data, and then out of that select the one that best classifies the input data sets. In the graph below there are three hyperplane possibilities. The hyperplane 3 divides the data points better.
Step 2: Select a hyperplane having maximum margin between the nearest data points: Margin is defined as the distance between the hyperplane and the nearest data points. So, it is optimum to have a maximum margin. When 2 or more than 2 hyperplanes are classifying the data equally, then find out the margin.
The hyperplane with the maximum margin is chosen. In the figure below, the hyperplane 2.1,2.2 and 2.3 divides the data points but the hyperplane 2.2 has a maximum margin.
Step 3: When outliers are present: Outliers are data points that are different from a set of data points. In the case of 2 sets of data points, an outlier may be present. The SVM ignores such outliers in the data and then finds a hyperplane of maximum margin.
Step 4: In case of non-linearly separated data points, The SVM uses a kernel trick. It will convert a non-linearly separable plane into separable pane by introducing a new dimension. Kernel tricks are complex mathematical equations that do complex data transformations to optimize the hyperplane.
The figure below shows non-linearly separable data points that are then transformed into high dimensionality with z plane. The hyperplane dividing the two data sets is a circle.
How To Optimize The Position Of The Hyperplane
The position of the hyperplane is optimized using the optimization algorithms.
SVM parameter: Kernel
Building an optimized hyperplane in a non linearly separable problem is done using Kernels. The kernels are mathematical functions that convert the complex problem using the linear algebraic form.
For a linear kernel, the equation is found as:
F(x)= b + sum (ai *(x, xi))
x is the new input data
xi is the support vector
b, ai are the coefficients. These coefficients are estimated during the learning phase of the algorithm.
For a complex nonlinearly separable problem, the kernel trick works as converting the nonlinear separable problem into a separable problem. It transforms the complex data and finds out the way to divide the data points into outputs.
The Kernel functions are of many types such as linear, polynomial, sigmoid, radial bias, non-linear, and many more.
Let’s see the uses of some of the above Kernel functions:
The kernel functions calculate the inner product of the 2 inputs x, y of the space.
#1) Radial Bias Function: The most used kernel function, says that for all values lying between x= [-1,1], the value of the function is 1 otherwise 0. For some intercept say xi, the value of Kernel function is 1 for |(xi– h), (xi+ h)|for xi taken as center and 0 otherwise.
#2) Sigmoid Function: As neural networks using Sigmoid, the kernel function for SVMs is
#3) Hyperbolic Tangent Function: As neural networks use, the function is
#4) Gaussian Kernel Function: The Gaussian Kernel function states
#5) Polynomial Function: k (xi, x j) = ( xi * xj +1 )2
Applications Of SVM
The real-life applications of SVM include:
#1) Classification of articles into different categories: The SVM differentiate between the written texts and put it into different categories such as Entertainment, Health, Fiction Articles. It is based on the pre-set threshold values calculated while training the SVM.
If the threshold value is crossed, then it is put on the category. If the value is not met, then new categories are defined for classification.
#2) Recognition of the face: The given image is classified as a facial image or non-facial image by detecting the features using pixels. The image is recognized as a face or a non-face.
#3) Health Information: SVMs are used to classify patients based on their genes, recognition of biological patterns, etc.
#4) Protein Homology Detection: In computational medical sciences, the SVMs are trained on how protein sequences are modeled. SVM is then used to detect protein homology.
In this support vector machine tutorial, we learned about support vector machines. SVM algorithms are supervised learning algorithms that are used to classify binary data and linearly separable data. It classifies the data points by a hyperplane with a maximum margin.
Non-Linear data points can also be classified by support vector machines using Kernel Tricks. There are many applications of SVM in real life, one of the most common application is face recognition and handwriting recognition.