← Back

Deep Learning for Vision — MLP vs CNN on FashionMNIST

Course Project

Compared MLP depth/activations and CNN architectures on FashionMNIST, then tested regularization, normalization, and data augmentation to improve generalization.

PythonPyTorchMLPCNNData AugmentationL1/L2Dropout

Benchmarked depth + activations (ReLU/tanh/Leaky ReLU) and their impact on accuracy

Showed why CNNs win on images and how augmentation affects performance/speed

Problem

Image classification performance depends heavily on architecture choices: depth, non-linearity, regularization, and how well the model captures spatial structure.

A plain MLP can classify images, but it ignores spatial locality — which is exactly what makes convolutional networks so effective.

The goal was to run structured experiments that isolate what actually improves accuracy (depth? activation? regularization? augmentation?) instead of guessing.

Solution

I built and trained multiple MLP variants (no hidden layer, 1×256, 2×256) and compared their test accuracy to quantify the effect of non-linearity and depth.

I then held architecture constant and changed activations (ReLU vs tanh vs Leaky ReLU) to study optimization behavior and representational differences.

Finally, I introduced CNNs and compared them against MLP baselines, including experiments with normalization, dropout/L1/L2 regularization, and data augmentation.

Architecture

Dataset: FashionMNIST (10-class grayscale clothing images).

MLP baselines: (1) linear classifier (no hidden layers), (2) 1 hidden layer (256), (3) 2 hidden layers (256 + 256), all with softmax classification.

Activation study: replicated the 2-hidden-layer model with tanh and Leaky ReLU to compare training stability and final generalization.

Regularization: tested L1/L2 penalties and dropout to reduce overfitting; compared normalized vs unnormalized inputs to see how scaling affects training.

CNN model: added convolutional layers to exploit spatial structure, then compared performance and learning curves against the MLP family.

Augmentation: introduced simple augmentations to improve robustness and evaluated the accuracy vs training-time tradeoff.

What I optimized

Experimental fairness: controlled variables so each change answers one question (depth alone, activation alone, regularization alone, CNN vs MLP).

Generalization: used regularization and augmentation to reduce the gap between training accuracy and test accuracy.

Training reliability: normalization experiments to reduce sensitivity to learning rates and help gradients behave predictably.

Results

Confirmed expected behavior: adding non-linearity + depth significantly improves over a linear baseline because it increases representational power.

Showed that activation choice impacts convergence and stability — ReLU-style activations typically train faster, while tanh can saturate and slow optimization depending on initialization/scaling.

Demonstrated why CNNs outperform MLPs on image data: convolutional inductive bias (locality + weight sharing) improves generalization and reduces parameter waste.

Documented the regularization/augmentation tradeoff: stronger regularization can stabilize test performance but may slow convergence or cap training accuracy.

What I'd do next

Add BatchNorm + lightweight residual blocks to test modern training stability improvements without dramatically increasing compute.

Run a small hyperparameter sweep (learning rate, weight decay, dropout) and report best settings with consistent seeds for reproducibility.

Do error analysis: inspect confusion pairs (e.g., similar clothing classes) to understand whether failures come from texture similarity or shape ambiguity.