Random Forest Classification in Python in 10 Lines

Random Forest algorithm is like an ensemble algorithm made of Decision Trees, which comprises more than one decision tree to create a model. It creates more than one tree like conditional control statements to create its model hence it is named as Random Forest.

Random Forest machine learning algorithm can be used to solve both regression and classification problem.

In this post we will be implementing a simple Random Forest classification model using python and sklearn.

First thing first , let us import the required libraries.

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import metrics 

After that we need to load data in jupyter notebook. You can find the data here.

Random Forest Classification Data Load
Random Forest Classification Data Load

Note that the above data has features called x1 and x2 and a label called label. The next step would be to split data into features and label as well as train and test as below.

x= df.drop('label',axis = 1)
y= df.label
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=42)

After this let us train the model

RandomForestClfModel = RandomForestClassifier()

Then we need to train the model

RandomForestClfModel.fit(x_train,y_train)
Random Forest Classification Model Training
Random Forest Classification Model Training

This is the time to do some prediction

y_pred = RandomForestClfModel.predict(x_test)

After the prediction is done we can evaluate the model by calculating accuracy as below.

accuracy = metrics.accuracy_score(y_test, y_pred)

0.984

I hope you enjoyed this article and can start using some of the techniques described here in your own projects soon. Cheers !!

2 thoughts on “Random Forest Classification in Python in 10 Lines

Leave a Reply

%d bloggers like this: