Decision tree machine learning algorithm can be used to solve not only regression but also classification problems.
This algorithm creates a tree like conditional control statements to create its model hence it is named as decision tree.
In this post we will be implementing a simple decision tree classification model using python and sklearn.
First thing first , let us import the required libraries.
import numpy as np import pandas as pd import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn import metrics
After that we need to load data in jupyter notebook. You can find the data here.
df = pd.read_csv('Classification-Data.csv') df.head()
Note that the above data has features called x1 and x2 and a label called label. The next step would be to split data into train and test as below.
x= df.drop('label',axis = 1) y= df.label x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)
After this let us train the model
DecisionTreeClfModel = DecisionTreeClassifier() DecisionTreeClfModel.fit(x_train,y_train)
This is the time to do some prediction
y_pred = DecisionTreeClfModel.predict(x_test)
After the prediction is done we can evaluate the model by calculating accuracy as below.
accuracy = metrics.accuracy_score(y_test, y_pred)
Note: You may also like to implement Logistic Regression for this problem.
There are various Advantages and Disadvantages of Decision Tree algorithm
And last but not the least you can visualize the decision tree classification model as below.
I hope you enjoyed this article and can start using some of the techniques described here in your own projects soon. Cheers !!