Introduction
Linear regression is a widely used statistical method to model the relationship between independent variables and a dependent variable. Normally, we solve regression problems using least squares estimation, where the design matrix provides enough equations to determine a unique set of coefficients. However, when the system of equations formed by the design matrix is underdetermined, meaning there are more variables (predictors) than equations (observations), the problem becomes tricky. Such cases arise frequently in modern data analysis, for example in genetics, text mining, or image recognition, where the number of features is very large compared to the number of samples. In this situation, there are infinitely many possible solutions, and the challenge is to decide which one to select.

Understanding the Problem
In an underdetermined system, the design matrix XXX has more columns than rows, making XTXX^TXXTX singular and non-invertible. As a result, the usual formula (XTX)−1XTy(X^TX)^{-1}X^Ty(XTX)−1XTy does not work because the inverse does not exist. Geometrically, this means that multiple coefficient vectors β\betaβ can produce the same predicted values of yyy. For example, if we try to fit two data points with three predictors, there are infinite combinations of coefficients that will pass exactly through those two points. Without a way to narrow down the choices, the solution is unstable and may not generalize to new data.

Approaches to Find a Solution
Pseudoinverse
One straightforward method is to use the Moore–Penrose pseudoinverse. Instead of relying on a true inverse, the pseudoinverse provides the solution with the smallest possible norm of coefficients. This ensures stability and avoids extreme values, though it may not always give the most interpretable model.
Regularization
Another powerful strategy is regularization, which adds constraints to the regression problem. Ridge regression, for example, introduces an L2L2L2 penalty term that shrinks coefficient values, ensuring uniqueness and stability. Lasso regression uses an L1L1L1 penalty that encourages sparsity, effectively performing feature selection by forcing some coefficients to zero. Elastic Net combines both Ridge and Lasso to balance stability with sparsity, making it especially useful when predictors are highly correlated.
Dimensionality Reduction
Alternatively, one may reduce the number of predictors before fitting the model. Principal Component Regression (PCR) transforms predictors into orthogonal components and uses only the most significant ones. Partial Least Squares (PLS) selects components that best explain the response. These methods not only solve the underdetermination issue but also help avoid overfitting in small-sample, high-dimensional data.
Few Data Points (rows of X)
---------------------
o o o o
---------------------
Many Predictors (columns of X)
More variables than equations → infinite β solutions

Conclusion
An underdetermined regression system is not an unsolvable problem but an invitation to think carefully about modeling choices. While there may be infinitely many solutions mathematically, practical techniques like pseudoinverse, regularization, and dimensionality reduction allow us to select meaningful and stable solutions. The method chosen depends on the context—whether we prioritize stability, interpretability, or sparsity. In fields where data is high-dimensional and sample sizes are small, these approaches ensure that regression remains not only possible but also reliable. Ultimately, the goal is to transform the challenge of underdetermination into an opportunity to build models that generalize well and provide useful insights.