Creating Synthetic Data for Logistic Regression

Many a times we want to implement Logistic Regression on certain data but we do not find that kind of data online. In that case we can generate a synthetic data for our problem.

In this post we will see how to generate a typical synthetic data for a simple Logistic Regression.

Import the required libraries first.

import pandas as pd
import sklearn.datasets

Use the make classification class of sklearn

# Can set the number of rows, number of classes and number of features
data = sklearn.datasets.make_classification(n_samples=1000, n_classes=2,n_clusters_per_class=1, n_features=2,n_informative=2, n_redundant=0, n_repeated=0)

Create the data frame from above generated data.

x = data[0]
y = data[1]
df = pd.DataFrame(data[0])
df['label'] = data[1]
df.head()

Thus we have a data frame df with two classes and two features.

Cheers !!

2 thoughts on “Creating Synthetic Data for Logistic Regression

  1. Hi Below code will be useful in renaming auto generated column names
    df.rename(columns={0:’a’}, inplace= True)
    df.rename(columns={1:’b’}, inplace= True)

Leave a Reply

%d bloggers like this: