← Back

Tumorous Cell Classification

Jan 2025 – Apr 2025

Microscopy image pipeline using feature extraction + SVM classification to distinguish tumorous vs normal cells and improve tumor counting accuracy.

PythonOpenCVscikit-imagescikit-learnNumPy

Feature extraction pipeline (HOG + geometric descriptors) + SVM classifier

Reduced tumor-count RMSE ~25% by improving feature design and model fit

Problem

Microscopy datasets look clean in theory but messy in practice: cell boundaries are imperfect, lighting varies, and “normal vs tumorous” often differs by subtle texture/shape cues rather than obvious color differences.

A naive approach (thresholding or simple heuristics) breaks quickly: it either over-counts noisy artifacts or under-counts real tumorous cells that don’t match a fixed template.

The core goal was to build a reliable classification signal that improves downstream tumor counting: if classification is unstable, counting becomes unstable too.

Solution

We built a classical computer-vision pipeline: preprocess images to stabilize appearance, extract features that encode meaningful shape/texture differences, then train a supervised model to separate tumorous vs normal cells.

Instead of relying on one type of feature, we combined complementary signals: texture-oriented descriptors (HOG) to capture gradient patterns and geometric/shape descriptors to capture morphology.

We used an SVM because it performs well on structured feature vectors with limited data and can build a strong decision boundary without requiring deep networks or huge compute.

Architecture

Preprocessing: standardized image inputs (normalization/denoising where appropriate) to reduce variance from lighting and acquisition noise.

Feature extraction: computed HOG descriptors to capture texture/edge structure, then added geometric features (size/compactness/shape cues) from segmentation outputs to encode morphology.

Model training: trained an SVM on feature vectors with careful iteration on feature composition and hyperparameters to improve separability and reduce misclassification.

Evaluation loop: measured classification impact on counting accuracy, iterated on thresholds/feature sets, and validated improvements by comparing predicted counts against ground truth.

What I optimized

Feature design: tuned the feature set to reduce the common failure modes (confusing noisy debris for tumorous cells, or missing tumorous cells with weak boundaries).

Robustness: iterated on preprocessing and extraction parameters so features stay stable across different image conditions, not just one “good-looking” subset.

Team workflow: used Git branches + reviews so experiments didn’t break the baseline pipeline and results stayed reproducible across teammates.

Results

Improved tumor counting accuracy by reducing RMSE ~25% through better feature engineering and model fit, showing that classification quality directly improved counting reliability.

Produced a consistent end-to-end pipeline that can be rerun across datasets without manual tuning per image, which is critical for real evaluation (not cherry-picked demos).

Validated that combining texture + geometry is stronger than either alone: HOG captured gradients/texture, while geometric features reduced confusion on similarly textured regions with different shapes.

What I'd do next

Add confidence-aware counting: propagate classifier confidence into the counting stage so uncertain predictions are flagged or handled differently (e.g., human review or conservative counting).

Compare classical features against a lightweight CNN baseline to quantify the tradeoff between interpretability + speed vs learned representations.

Improve generalization: augment data to handle lighting variance and acquisition differences, then evaluate performance across multiple “styles” of microscopy rather than one dataset distribution.

Make the pipeline more explainable: feature importance + error clustering to understand exactly what patterns cause false positives/negatives and guide targeted improvements.