Many a times you want to implement Linear Regression on certain data but you do not find that kind of data online. In that case you can generate a synthetic data for your problem.
In this post we will see how to generate a typical synthetic data for a simple Linear Regression.
Import the required libraries first.
import numpy as np
import pandas as pd
Then generate the random number using numpy.linspace
np.random.seed(42)
m = 0.01; c = 6 # data with approximate straight line
#numpy.linspace(start, stop, number = 50)
x = np.linspace(50, 2000, 100)
y = m*x + c + np.random.randn(100)*2 +150
x = x
y = y
Let us convert the above into pandas datafrane
df = pd.DataFrame({'x':x,'y':y})
df.head()
Then save the data in csv format
df.to_csv('Linear-Regression.csv',index = None, header=True)
You can also generate data for more than one feature as below
#numpy.linspace(start, stop, num = 50, endpoint = True, retstep = False, dtype = None)
np.random.seed(42)
m1 = 0.001; m2 = 0.002; c = 6 # data with approximate straight line
#m = 0.5; c = 6 # data with good straight line
x1 = np.linspace(50, 2000, 100) - 50
x2 = np.linspace(500, 5000, 100) + 50
y1 = m1*x1 + m2*x2 + c
dfx = pd.DataFrame({'x1':x1,'x2':x2,'y1':y1})
dfx.head()
I hope this will help you in creating synthetic datasets.
You can also create Linear Regression Synthetic Data using Make Regression
Cheers !!
One thought on “Creating Synthetic Data for Linear Regression”