Problem
A model can look great on training data and still fail in practice. The core issue is choosing the right capacity: too simple underfits, too complex overfits noise.
I wanted a controlled experiment that makes the bias–variance tradeoff obvious — where we can increase complexity gradually and watch train vs validation error diverge.
The goal wasn’t just fitting a curve, but learning how to justify model choice using validation curves and regularization rather than intuition.
Solution
I generated regression data from a known non-linear target function and fit models using Gaussian basis functions to control complexity.
By increasing the number of basis functions (D), I systematically increased model flexibility and tracked training SSE vs validation SSE to identify the generalization sweet spot.
I studied how regularization (L1/L2) can ‘tame’ high-capacity models by discouraging overly large weights and reducing sensitivity to noise.
Architecture
Data generation: created noisy samples from a known underlying function so we can objectively evaluate whether the model captures structure vs memorizes noise.
Feature mapping: expanded x into a non-linear feature space using Gaussian basis functions (controlled by D).
Model fitting: trained the regression model and computed SSE on both training and validation splits for each D.
Model selection: plotted error curves across D to show underfitting (high error both), optimal capacity (low validation error), then overfitting (training keeps improving while validation worsens).
Regularization experiments: compared how L1/L2 change the error curves and reduce the overfit regime.
What I optimized
Experiment clarity: designed the setup so the effect of complexity is isolated (same data, same noise process, only D changes).
Decision-making: used validation SSE as the selection criterion and explained why the minimum validation point is the best capacity for generalization.
Model robustness: explored regularization as a way to keep expressive models usable without exploding weights or unstable fits.
Results
Produced a clean validation curve showing the full lifecycle: underfit → optimal → overfit as D increases.
Empirically demonstrated that beyond a certain capacity, training SSE continues to decrease while validation SSE stops improving and begins increasing — classic overfitting behavior.
Showed that regularization shifts the curve to a more stable regime, making higher-capacity models generalize better than unregularized ones.
What I'd do next
Add K-fold cross-validation to reduce sensitivity to a single split and produce confidence intervals on selected D.
Compare against ridge/lasso closed-form baselines and Bayesian model selection to connect validation curves to probabilistic model evidence.
Extend beyond synthetic data: repeat the same methodology on a real regression dataset to show how noise and feature correlations affect the curve shape.