WEKA is a data mining application and is being developed by Waikato University in New Zealand. The purpose of this article is to teach you how to use the WEKA Explorer.
1. Simple CLI is a simple command line interface and provide to run Weka functions directly.
2. Explorer is an environment to discover the data.
3. Experimenter is an environment to make experiments and statistical tests between learning schemes.
4. KnowledgeFlow is an Java-Beans based interface for tuning and machine learning experiments.
I will use ‘Explorer’ for the exercises. Just click the Explorer button to switch to the Explorer section.
Most of the time, the data wouldn’t be perfect, it would need to do pre-process before applying machine learning algorithms on it. Doing pre-process is so easy in Weka. You can simply click to Open file button and load your file as certain file types: Arff, CSV, C4.5, binary, LIBSVM, XRFF; you can also load SQL db file via the url. However, we won’t need to do pre-process for this post since we’ll use the data that Weka provide for us.
If your data type is in xls format like in below image, you have to convert the file. I’ll use Iris dataset to illustrate the conversion:
- Convert your .xls to .csv format
- Open your csv file in any text editor and first add @RELATION database_name to the first row of csv file
- Then add attributes by using the following definition: @ATTRIBUTE attr_name attr_type. If attr_type is numeric you should define as REAL, otherwise you have to add values between curly parenthesis. Sample images are in below.
- At last, add @DATA tag just above on your data rows. Then save your file with .arff extension. You can see the illustration in below image.
Open File in Local File System
Click ‘Open file’ button from pre-process section and load your .arff file from your local file system. If you couldn’t convert your .csv to .arff, don’t worry because Weka will do that instead of you.
If you could follow all the steps so far, you can load your data set successfully and you’ll see attribute names (it is illustrated at the red area on above images). Pre-process stage is named as Filter in WEKA you can click ‘Choose’ button from Filter and apply any filter you want. For example, if you would like to use Association Rule Mining as an algorithm, you have to dissociate numeric and continues attributes. To be able to do that you can follow the following path: Choose -> Filter -> Supervised -> Attribute -> Discritize