A Comparative Analysis of Classification Algorithms on Weather Dataset Using Data Mining Tool

Data mining has become one of the emerging fields in research because of its vast contents. Data mining is used for finding hidden patterns in the database or any other information repository. This information is necessary to generate knowledge from the patterns. The main task is to extract knowledge out of the information. In this paper we use a data mining technique called classification to determine the playing condition based on the current temperature values. Classification technique is a powerful way to classify the attributes of the dataset into different classes. In our approach we use classification algorithms like Decision Tree (J48), REP Tree and Random Tree. Then we compare the efficiencies of these classification algorithms. The tool we use for this approach is WEKA (Waikato Environment for Knowledge Analysis) a collection of open source machine learning algorithms. Oriental Journal of Computer Science and Technology Journal Website: www.computerscijournal.org ISSN: 0974-6471, Vol. 10, No. (4) 2017, Pg. 788-792


Introduction
Data mining is that the method to extract or mine data from immense volume of information.Broadly data processing will be outlined because the task of extracting implicit, antecedently unknown potential helpful data from knowledge in giant databases.Data mining tasks are classified as descriptive which discover interesting patterns or relationships describing the data and predictive task which predicts or classifies the behavior of the model supported obtainable information.It's a content field with a general goal of predicting outcomes and uncovering relationships.Some of the data mining techniques are Classification, Clustering and Rule Mining.
Clustering is that the most typically used information discovery technique.It helps un-covering the unknown category labels.It helps un-covering the unknown class labels.Clustering has gained importance in many applications in the recent past.Most of the cluster algorithms area unit ascendable to large dataset.Weather is random entity.Forecasting is the technology to predict the atmosphere at given location and a given time taking into consideration various factors such as humidity, temperature, wind and outlook.It's done by gathering the information regarding this state of the atmosphere at a given location thus applies scientific understanding to predict but the temperature will modification over the course of some time.In our paper we are going to predict whether the play can happen based on current weather values such as temperature, humidity, windy, outlook 11 .We make the prediction based on various classification algorithms such as Decision Tree (J48), REP Tree and Random Tree.We conjointly compare every of those algorithms in terms of their accuracy mistreatment completely different measures.

Classification Algorithms Decision Tree Induction
DTI is a tree learning algorithms.It consists of flow diagram like structure wherever the inner node denotes a take a look at on the attribute, the branches will denote the outcome of the test performed on the attribute and the leaf nodes will denote class labels.
The internal nodes are represented as rectangles and the leaf nodes are represented with oval shapes.
To determine the cacophonic attribute it makes use of various attribute choice measures like data gain, gain quantitative relation and Gini Index.

REP Tree
It is a decision tree learner algorithm.It constructs the decision tree exploitation data gain or variance then prunes it exploitation reduced error pruning exploitation back fitting strategy.REP Tree Iteratively generates multiple trees using regression logic.It sorts the values for numeric attribute only once.It deals with missing values by rending the corresponding instances into items.

Random Trees
This algorithm can deal with both regression and classification problems.it's a group of tree predictors that's referred to as forest.It takes the input as feature vector and compares it with each tree within the forest and offers the result category label that has highest votes.

Classifier Output Measures
The classifier classifies the tuples in the dataset.It is quite natural that the classifier may have error rate and may fail to correctly classify the tuples.Hence we measure the classifier accuracy which is given by the percentage of instances that square measure properly classified by classifier.

Confusion Matrix
It gives information regarding the classifier output in terms of the number of tuples that are correctly classified and the numbers of tuples that are miss classified.For a good accuracy classifier the elements must be in along the diagonal while the other entries must be close to zero.

Mean Absolute Error
It is a measure for accuracy.It is the mean of the absolute errors that is the mean of the distinction between the expected value and also the actual value.

Root Mean Square Error
If we take the square root of the mean square error then we obtain the root mean square error.We do it to adjust large error rates.

Results and Comparisons
The tool we used for the result analysis is WEKA which consists of large number of open source machine learning algorithms.It takes the input in the form of ARFF (Attribute Relation File Format),CSV(comma separated values).The data set we used is weather which is input to weka in ARFF format.
The weather data set contains following attributes.

Conclusion
This paper intends to study the classifier accuracy of various classification algorithms using WEKA tool on weather dataset.The experimental results of the various classification algorithms is listed.First the experiment was done on the weather dataset using j48 algorithm which classifies all the instances correctly.The accuracy of the j48 classifier is 100%.
Then the dataset was run on Random Tree classifier which classifies all instances correctly and has 100 % accuracy.Then classification was done using REP Tree classifier and we found the accuracy was decreased to 64.28 % because it was not able to classify all the instances correctly and we found that 5 instances were misclassified by REP Tree classifier because of which its accuracy is decreased.