Problem
Microscopy datasets look clean in theory but messy in practice: cell boundaries are imperfect, lighting varies, and “normal vs tumorous” often differs by subtle texture/shape cues rather than obvious color differences.
A naive approach (thresholding or simple heuristics) breaks quickly: it either over-counts noisy artifacts or under-counts real tumorous cells that don’t match a fixed template.
The core goal was to build a reliable classification signal that improves downstream tumor counting: if classification is unstable, counting becomes unstable too.
Solution
We built a classical computer-vision pipeline: preprocess images to stabilize appearance, extract features that encode meaningful shape/texture differences, then train a supervised model to separate tumorous vs normal cells.
Instead of relying on one type of feature, we combined complementary signals: texture-oriented descriptors (HOG) to capture gradient patterns and geometric/shape descriptors to capture morphology.
We used an SVM because it performs well on structured feature vectors with limited data and can build a strong decision boundary without requiring deep networks or huge compute.
Architecture
Preprocessing: standardized image inputs (normalization/denoising where appropriate) to reduce variance from lighting and acquisition noise.
Feature extraction: computed HOG descriptors to capture texture/edge structure, then added geometric features (size/compactness/shape cues) from segmentation outputs to encode morphology.
Model training: trained an SVM on feature vectors with careful iteration on feature composition and hyperparameters to improve separability and reduce misclassification.
Evaluation loop: measured classification impact on counting accuracy, iterated on thresholds/feature sets, and validated improvements by comparing predicted counts against ground truth.
What I optimized
Feature design: tuned the feature set to reduce the common failure modes (confusing noisy debris for tumorous cells, or missing tumorous cells with weak boundaries).
Robustness: iterated on preprocessing and extraction parameters so features stay stable across different image conditions, not just one “good-looking” subset.
Team workflow: used Git branches + reviews so experiments didn’t break the baseline pipeline and results stayed reproducible across teammates.
Results
Improved tumor counting accuracy by reducing RMSE ~25% through better feature engineering and model fit, showing that classification quality directly improved counting reliability.
Produced a consistent end-to-end pipeline that can be rerun across datasets without manual tuning per image, which is critical for real evaluation (not cherry-picked demos).
Validated that combining texture + geometry is stronger than either alone: HOG captured gradients/texture, while geometric features reduced confusion on similarly textured regions with different shapes.
What I'd do next
Add confidence-aware counting: propagate classifier confidence into the counting stage so uncertain predictions are flagged or handled differently (e.g., human review or conservative counting).
Compare classical features against a lightweight CNN baseline to quantify the tradeoff between interpretability + speed vs learned representations.
Improve generalization: augment data to handle lighting variance and acquisition differences, then evaluate performance across multiple “styles” of microscopy rather than one dataset distribution.
Make the pipeline more explainable: feature importance + error clustering to understand exactly what patterns cause false positives/negatives and guide targeted improvements.