Problem
Image classification performance depends heavily on architecture choices: depth, non-linearity, regularization, and how well the model captures spatial structure.
A plain MLP can classify images, but it ignores spatial locality — which is exactly what makes convolutional networks so effective.
The goal was to run structured experiments that isolate what actually improves accuracy (depth? activation? regularization? augmentation?) instead of guessing.
Solution
I built and trained multiple MLP variants (no hidden layer, 1×256, 2×256) and compared their test accuracy to quantify the effect of non-linearity and depth.
I then held architecture constant and changed activations (ReLU vs tanh vs Leaky ReLU) to study optimization behavior and representational differences.
Finally, I introduced CNNs and compared them against MLP baselines, including experiments with normalization, dropout/L1/L2 regularization, and data augmentation.
Architecture
Dataset: FashionMNIST (10-class grayscale clothing images).
MLP baselines: (1) linear classifier (no hidden layers), (2) 1 hidden layer (256), (3) 2 hidden layers (256 + 256), all with softmax classification.
Activation study: replicated the 2-hidden-layer model with tanh and Leaky ReLU to compare training stability and final generalization.
Regularization: tested L1/L2 penalties and dropout to reduce overfitting; compared normalized vs unnormalized inputs to see how scaling affects training.
CNN model: added convolutional layers to exploit spatial structure, then compared performance and learning curves against the MLP family.
Augmentation: introduced simple augmentations to improve robustness and evaluated the accuracy vs training-time tradeoff.
What I optimized
Experimental fairness: controlled variables so each change answers one question (depth alone, activation alone, regularization alone, CNN vs MLP).
Generalization: used regularization and augmentation to reduce the gap between training accuracy and test accuracy.
Training reliability: normalization experiments to reduce sensitivity to learning rates and help gradients behave predictably.
Results
Confirmed expected behavior: adding non-linearity + depth significantly improves over a linear baseline because it increases representational power.
Showed that activation choice impacts convergence and stability — ReLU-style activations typically train faster, while tanh can saturate and slow optimization depending on initialization/scaling.
Demonstrated why CNNs outperform MLPs on image data: convolutional inductive bias (locality + weight sharing) improves generalization and reduces parameter waste.
Documented the regularization/augmentation tradeoff: stronger regularization can stabilize test performance but may slow convergence or cap training accuracy.
What I'd do next
Add BatchNorm + lightweight residual blocks to test modern training stability improvements without dramatically increasing compute.
Run a small hyperparameter sweep (learning rate, weight decay, dropout) and report best settings with consistent seeds for reproducibility.
Do error analysis: inspect confusion pairs (e.g., similar clothing classes) to understand whether failures come from texture similarity or shape ambiguity.