Creating Synthetic Data for Linear Regression

Many a times you want to implement Linear Regression on certain data but you do not find that kind of data online. In that case you can generate a synthetic data for your problem.

In this post we will see how to generate a typical synthetic data for a simple Linear Regression.

Import the required libraries first.

import numpy as np
import pandas as pd

Then generate the random number using numpy.linspace

np.random.seed(42)
m = 0.01; c = 6 # data with approximate straight line
#numpy.linspace(start, stop, number = 50)
x = np.linspace(50, 2000, 100)
y = m*x + c + np.random.randn(100)*2 +150
x = x
y = y

Let us convert the above into pandas datafrane

df = pd.DataFrame({'x':x,'y':y})
df.head()

Then save the data in csv format

df.to_csv('Linear-Regression.csv',index = None, header=True)

You can also generate data for more than one feature as below

#numpy.linspace(start, stop, num = 50, endpoint = True, retstep = False, dtype = None) 
np.random.seed(42)
m1 = 0.001; m2 = 0.002; c = 6 # data with approximate straight line
#m = 0.5; c = 6 # data with good straight line
x1 = np.linspace(50, 2000, 100) - 50
x2 = np.linspace(500, 5000, 100) + 50
y1 = m1*x1 + m2*x2 + c 
dfx = pd.DataFrame({'x1':x1,'x2':x2,'y1':y1})
dfx.head()

I hope this will help you in creating synthetic datasets.

You can also create Linear Regression Synthetic Data using Make Regression

Cheers !!

One thought on “Creating Synthetic Data for Linear Regression

Leave a Reply