Problem
Real-world datasets rarely behave like clean textbook examples: features live on different scales, correlations can be misleading, and naive training can converge slowly or to unstable solutions.
I wanted to understand the full end-to-end pipeline behind common models (linear regression and logistic regression) — not just using libraries, but implementing the learning loop and seeing how optimization choices change convergence and generalization.
For classification, accuracy alone isn’t enough: a model can be ‘right’ but still output unreliable probabilities. That matters when predictions are used for downstream decisions.
Solution
I implemented linear regression and logistic regression from scratch, including the full training loop, gradient computation, and evaluation metrics.
I trained using both full-batch Gradient Descent and mini-batch SGD to compare speed, stability, and sensitivity to hyperparameters.
To reduce overfitting and improve reliability, I added regularization (Ridge/L2) and studied probability calibration to make predicted confidences more trustworthy.
Architecture
Data pipeline: loaded datasets, removed non-informative ID columns, checked missing values/duplicates, and ran basic statistics + correlation exploration to understand feature behavior.
Linear regression (Parkinson’s): implemented MSE/SSE objective, computed gradients, trained with GD/mini-batch SGD, and tracked train/validation error to detect over/underfitting.
Logistic regression (Breast Cancer): implemented sigmoid + cross-entropy loss, trained with GD/SGD, and evaluated using classification metrics (not just raw loss).
Regularization + calibration: applied L2 penalty to control weight growth and compared how it affects generalization; examined probability outputs to ensure confidence aligns with correctness.
What I optimized
Training stability: compared batch size effects (variance vs convergence smoothness) and tuned learning rates to avoid divergence or painfully slow training.
Generalization: used validation-driven iteration with L2 regularization to reduce overfitting, especially when features were correlated or noisy.
Interpretability: relied on clear plots (loss curves, error trends, correlation heatmaps) to justify decisions instead of guessing hyperparameters blindly.
Results
Produced an end-to-end reproducible pipeline that trains linear and logistic regression models from scratch and compares GD vs mini-batch SGD behavior.
Demonstrated clear convergence differences: batch GD produced smoother learning curves, while mini-batch SGD reached good solutions faster but introduced noisier updates.
Showed that regularization improves validation performance and makes the model more robust to noisy/high-variance features; calibration analysis highlighted the gap between ‘accuracy’ and ‘trustworthy confidence.’
What I'd do next
Add stronger baselines (scikit-learn) and report side-by-side performance + runtime to quantify the tradeoff between ‘from scratch’ control and optimized library implementations.
Extend evaluation beyond a single split: K-fold cross-validation + confidence intervals for more statistically stable comparisons.
Package the training loops into a clean mini-library (fit/predict/score + plotting utilities) with unit tests so experiments become fast and reusable.