Weka Tutorial – How To Download, Install And Use Weka Tool

This WEKA tutorial explains what is Weka Machine Learning tool, its features, and how to download, install, and use Weka Machine Learning Software:

In the Previous Tutorial, we learned about Support Vector Machine in ML and associated concepts like Hyperplane, Support Vectors & Applications of SVM.

Machine Learning is a field of science where machines act as an artificially intelligent system. The machines can learn by themselves without requiring any explicit coding. It is an iterative process that access data, learns by itself, and predicts the outcome. For executing machine learning tasks many tools and scripts are required.

WEKA is a machine learning platform consisting of many tools facilitating many machine learning activities.

=> Read Through The Complete Machine Learning Training Series

WEKA Tutorial

What Is WEKA

WEKA Bird Logo

WEKAGUIChooser

Weka is an open-source tool designed and developed by the scientists/researchers at the University of Waikato, New Zealand. WEKA stands for Waikato Environment for Knowledge Analysis. It is developed by the international scientific community and distributed under the free GNU GPL license.

WEKA is fully developed in Java. It provides integration with the SQL database using Java Database connectivity. It provides many machine learning algorithms to implement data mining tasks. These algorithms can either be used directly using the WEKA tool or can be used with other applications using Java programming language.

It provides a lot of tools for data preprocessing, classification, clustering, regression analysis, association rule creation, feature extraction, and data visualization. It is a powerful tool that supports the development of new algorithms in machine learning.

Why Use WEKA Machine Learning Tool

With WEKA, the machine learning algorithms are readily available to the users. The ML specialists can use these methods to extract useful information from high volumes of data. Here, the specialists can create an environment to develop new machine learning methods and implement them on real data.

WEKA is used by machine learning and applied sciences researchers for learning purposes. It is an efficient tool for carrying out many data mining tasks.

WEKA Download And Installation

#1) Download the software from here.

Check the configuration of the computer system and download the stable version of WEKA (currently 3.8) from this page.

Download the stable version of WEKA

#2) After successful download, open the file location and double click on the downloaded file. The Step Up wizard will appear. Click on Next.

Step Up wizard

#3) The License Agreement terms will open. Read it thoroughly and click on “I Agree”.

The License Agreement Terms

#4) According to your requirements, select the components to be installed. Full component installation is recommended. Click on Next.

select the components to be installed

#5) Select the destination folder and Click on Next.

destination folder

#6) Then, Installation will start.

Installation Window

#7) If Java is not installed in the system, it will install Java first.

Java installation window

#8) After the installation is complete, the following window will appear. Click on Next.

Installation Complete Window

#9) Select the Start Weka checkbox. Click on Finish.

Select the Start Weka checkbox

#10) WEKA Tool and Explorer window opens.

WEKA Tool and Explorer

#11) The WEKA manual can be downloaded from here.

Graphical User Interface Of WEKA

The GUI of WEKA gives five options: Explorer, Experimenter, Knowledge flow, Workbench, and Simple CLI. Let us understand each of these individually.

#1) Simple CLI

Simple CLI

Simple CLI is Weka Shell with command line and output. With “help”, the overview of all the commands can be seen. Simple CLI offers access to all classes such as classifiers, clusters, and filters, etc.

Some of the simple CLI commands are:

  • Break: To stop the current thread
  • Exit: Exit the CLI
  • Help[<command>] : Outputs the help for the specified command
  • -java weka.classifiers.trees.J48 -t c:/temp/iris.arff : To invoke a WEKA class, prefix it with Java. This command will direct WEKA to load the class and execute it with given parameters. In this command J48 classifier is invoked on the IRIS dataset.

simple CLI commands

#2) Explorer

explorer

The WEKA Explorer windows show different tabs starting with preprocess. Initially, the preprocess tab is active, as first the data set is preprocessed before applying algorithms to it and explored the dataset.

The tabs are as follows:

  1. Preprocess: Choose and modify the loaded data.
  2. Classify: Apply training and testing algorithms to the data that will classify and regress the data.
  3. Cluster: Form clusters from the data.
  4. Associate: Mine out association rule for the data.
  5. Select attributes: Attribute selection measures are applied.
  6. Visualize: 2D representation of data is seen.
  7. Status Bar: The bottommost section of the window shows the status bar. This section shows what is happening currently in the form of a message, such as a file is being loaded. Right-click on this, Memory information can be seen, and also Run garbage collector to free up space can be run.
  8. Log Button: It stores a log of all actions in Weka with the timestamp. Logs are shown in a separate window when the Log button is clicked.
  9. WEKA Bird Icon: Present on the bottom right corner shows WEKA bird with represents the number of processes running concurrently (by x.). When the process is running the bird will move around.

#3) Experimenter

Experimenter

The WEKA experimenter button allows the users to create, run, and modify different schemes in one experiment on a dataset. The experimenter has 2 types of configuration: Simple and Advanced. Both configurations allow users to run experiments locally and on remote computers.

  1. The “Open” and “New” button will open a new experiment window that users can do.
  2. Results: Set the result destination file from ARFF, JDFC, and CSV file.
  3. Experiment Type: The user can choose between cross-validation and train/test percentage split. The user can choose between Classification and Regression-based upon the dataset and classifier used.
  4. Datasets: The user can browse and select datasets from here. The relative path checkbox is clicked if working on different machines. The format of datasets supported is ARFF, C4.5, CSV, libsvm, bsi and XRFF.
  5. Iteration: The default iteration number is set to 10. Datasets first and algorithms first helps in switching between dataset and algorithms so that algorithms can be run on all datasets.
  6. Algorithms: New algorithms are added by “New Button”. The user can choose a classifier.
  7. Save the experiment using the Save button.
  8. Run the experiment using the Run button.

Weka choose algorithm

#4) Knowledge Flow

WEKA KnowledgeFlow

Knowledge flow shows a graphical representation of WEKA algorithms. The user can select the components and create a workflow to analyze the datasets. The data can be handled by batch-wise or incrementally. Parallel workflows can be designed and each will run in a separate thread.

The different components available are Datasources, Datasavers, Filters, Classifiers, Clusters, Evaluation, and Visualization.

#5) Workbench

WEKA has workbench module which contains all the GUI’s in a single window.

WEKA Workbench

Features Of WEKA Explorer

#1) Dataset

A dataset is made of items. It represents an object for example: in the marketing database, it will represent customers and products. The datasets are described by attributes. The dataset contains data tuples in a database. A dataset has attributes that can be nominal, numeric, or string. In Weka, the dataset is represented by weka.core.Instances class.

Representation of dataset with 5 examples:

@data
sunny, FALSE,85,85,no
sunny, TRUE,80,90,no
overcast, FALSE,83,86,yes
rainy, FALSE,70,96,yes
rainy, FALSE,68,80,yes

What is an Attribute?

An attribute is a data field representing the characteristic of a data object. For example, in a customer database, the attributes will be customer_id, customer_email, customer_address, etc. Attributes have different types.

These possible types are:

A) Nominal Attributes: Attribute which relates to a name and has predefined values such as color, weather. These attributes are called categorical attributes. These attributes do not have any order and their values are also called enumerations.

@attribute outlook {sunny, overcast, rainy}: declaration of the nominal attribute.

B) Binary Attributes: These attributes represent only values 0 and 1. These are the type of nominal attributes with only 2 categories. These attributes are also called Boolean.

C) Ordinal Attributes: The attributes which preserve some order or ranking amongst them are ordinal attributes. Successive values cannot be predicted but only order is maintained. Example: size, grade, etc.

D) Numeric Attributes: Attributes representing measurable quantities are numeric attributes. These are represented by real numbers or integers. Example: temperature, humidity.

@attribute humidity real: declaration of a numeric attribute

E) String Attributes: These attributes represent a list of characters represented in double-quotes.

#2) ARFF Data format

WEKA works on the ARFF file for data analysis. ARFF stands for Attribute Relation File Format. It has 3 sections: relation, attributes, and data. Every section starts with “@”.

ARFF files take Nominal, Numeric, String, Date, and Relational data attributes. Some of the well-known machine learning datasets are present in WEKA as ARFF.

Format for ARFF is:

@relation <relation name>
@attribute <attribute name and data type>
@data

An example of an ARFF file is:

@relation weather
@attribute outlook {sunny, overcast, rainy}:
@attribute temperature real
@attribute humidity real
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}        //class attribute: The class attribute represents the output.

@data
sunny, FALSE,85,85,no
sunny, TRUE,80,90,no
overcast, FALSE,83,86,yes
rainy, FALSE,70,96,yes
rainy, FALSE,68,80,yes

#3) XRFF Data Format

XRFF stands for the XML attribute Relation File Format. It represents data that can store comments, attributes, and instance weights. It has .xrff extension and .xrff.gz (compressed format) file extension. The XRFF files represented data in XML format.

#4) Database Connectivity

With WEKA, it is easy to connect to a database using a JDBC driver. JDBC driver is necessary to connect to the database, example:

MS SQL Server (com.microsoft.jdbc.sqlserver.SQLServerDriver)

Oracle (oracle.jdbc.driver.OracleDriver)

#5) Classifiers

To predict the output data, WEKA contains classifiers. The classification algorithms available for learning are decision-trees, support vector machines, instance-based classifiers, and logistic regression, and Bayesian networks. Depending upon the requirement using trial and test, the user can find out a suitable algorithm for the analysis of data. Classifiers are used to classify the data sets based on the characteristics of the attributes.

#6) Clustering

WEKA uses the Cluster tab to predict the similarities in the dataset. Based on clustering, the user can find out the attributes useful for analysis and ignore other attributes. The available algorithms for clustering in WEKA are k-means, EM, Cobweb, X-means, and FarhtestFirst.

#7) Association

The only algorithm available in WEKA for finding out association rules is Apriori.

#8) Attribute Section Measures

WEKA uses 2 approaches for best attribute selection for calculation purpose:

  • Using Search method algorithm: Best-first, forward selection, random, exhaustive, genetic algorithm, and ranking algorithm.
  • Using Evaluation method algorithms: Correlation-based, wrapper, information gain, chi-squared.

#9) Visualization

WEKA supports the 2D representation of data, 3D visualizations with rotation, and 1D representation of single attribute. It has the “Jitter” option for nominal attributes and “hidden” data points.

Other main features of WEKA are:

  • It is an open-source tool with Graphical User Interface in the form of “Explorer”, “Experimenter” and “Knowledge Flow”.
  • It is platform-independent.
  • It contains 49 data preprocessing tools.
  • 76 classification and regressions algorithms, 8 clustering algorithms are present in WEKA
  • It has 15 attribute selection algorithms and 10 feature selection algorithms.
  • It has 3 algorithms for finding association rule.
  • Using WEKA, users can develop custom code for machine learning.

Conclusion

In this WEKA tutorial, we provided an introduction to the open-source WEKA Machine Learning Software and explained step by step download and installation process. We have also seen the five options available for Weka Graphical User Interface, namely, Explorer, Experimenter, Knowledge flow, Workbench, and Simple CLI.

We have also learned about the features of WEKA with examples. The features include Dataset, ARFF Data format, database connectivity, etc.

=> Visit Here For The Exclusive Machine Learning Series