Top Articles

# Creating Synthetic Data for Linear Regression Many a times you want to implement Linear Regression on certain data but you do not find that kind of data online. In that case you can generate a synthetic data for your problem.

In this post we will see how to generate a typical synthetic data for a simple Linear Regression.

Import the required libraries first.

``````import numpy as np
import pandas as pd``````

Then generate the random number using numpy.linspace

``````np.random.seed(42)
m = 0.01; c = 6 # data with approximate straight line
#numpy.linspace(start, stop, number = 50)
x = np.linspace(50, 2000, 100)
y = m*x + c + np.random.randn(100)*2 +150
x = x
y = y``````

Let us convert the above into pandas datafrane

``````df = pd.DataFrame({'x':x,'y':y})

Then save the data in csv format

``df.to_csv('Linear-Regression.csv',index = None, header=True)``

You can also generate data for more than one feature as below

``````#numpy.linspace(start, stop, num = 50, endpoint = True, retstep = False, dtype = None)
np.random.seed(42)
m1 = 0.001; m2 = 0.002; c = 6 # data with approximate straight line
#m = 0.5; c = 6 # data with good straight line
x1 = np.linspace(50, 2000, 100) - 50
x2 = np.linspace(500, 5000, 100) + 50
y1 = m1*x1 + m2*x2 + c
dfx = pd.DataFrame({'x1':x1,'x2':x2,'y1':y1})