Random Forest algorithm is like Decision Tree, which comprises more than one decision tree to create a model. Random Forest algorithm is an ensemble method. It creates more than one tree like conditional control statements to create its model hence it is named as Random Forest.
Random Forest machine learning algorithm can be used to solve both regression and classification problem.
In this post we will be implementing a simple Random Forest regression model using python and sklearn.
First thing first , let us import the required libraries.
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score,mean_squared_error
After that we need to load data in jupyter notebook. You can find the data here.
df = pd.read_csv('RF-Regression-Data.csv')
df.head()

Note that the above data has a feature called x and a label called y. We have to use values of x to predict y. The next step would be to split data into train and test as below.
x = df.x.values.reshape(-1, 1)
y = df.y.values.reshape(-1, 1)
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=42)
Then we need to define the model
RandomForestRegModel = RandomForestRegressor()
Now the model is ready to be trained
RandomForestRegModel.fit(x_train,y_train)

This is the time to do some prediction
y_pred = RandomForestRegModel.predict(x_test)
After the prediction is done we can evaluate the model using the popular metrics R squared and RMSE as below.

I hope you enjoyed this article and can start using some of the techniques described here in your own projects soon. Cheers !!
One thought on “Random Forest Regression in Python in 10 Lines”