Creating Synthetic Data for Linear Regression

Many a times you want to implement Linear Regression on certain data but you do not find that kind of data online. In that case you can generate a synthetic data for your problem.

In this post we will see how to generate a typical synthetic data for a simple Linear Regression.

Import the required libraries first.

import numpy as np
import pandas as pd

Then generate the random number using numpy.linspace

np.random.seed(42)
m = 0.01; c = 6 # data with approximate straight line
#numpy.linspace(start, stop, number = 50)
x = np.linspace(50, 2000, 100)
y = m*x + c + np.random.randn(100)*2 +150
x = x
y = y

Let us convert the above into pandas datafrane

df = pd.DataFrame({'x':x,'y':y})
df.head()

Then save the data in csv format

df.to_csv('Linear-Regression.csv',index = None, header=True)

You can also generate data for more than one feature as below

#numpy.linspace(start, stop, num = 50, endpoint = True, retstep = False, dtype = None) 
np.random.seed(42)
m1 = 0.001; m2 = 0.002; c = 6 # data with approximate straight line
#m = 0.5; c = 6 # data with good straight line
x1 = np.linspace(50, 2000, 100) - 50
x2 = np.linspace(500, 5000, 100) + 50
y1 = m1*x1 + m2*x2 + c 
dfx = pd.DataFrame({'x1':x1,'x2':x2,'y1':y1})
dfx.head()

I hope this will help you in creating synthetic datasets.

You can also create Linear Regression Synthetic Data using Make Regression

Cheers !!

Advertisements

Leave a Reply