Coursework 2: Logistic Regression & Loss Functions

CS3317: Artificial Intelligence Canvas Logistic Regression NumPy FashionMNIST
Implement a multi-class logistic regression classifier from scratch with NumPy. Explore four different loss functions and compare their convergence speed and final accuracy on FashionMNIST.

Overview

You will build every component of a logistic regression pipeline by hand — softmax, loss functions, gradient computation, and parameter updates — using NumPy only (no PyTorch).

  • Dataset: FashionMNIST — 60K train / 10K test, 28×28 grayscale images of 10 clothing categories, flattened to 784-dim vectors
  • Model: p = softmax(XW + b) — a single linear layer + softmax
  • Loss functions: cross-entropy, hinge, exponential, squared
  • Outputs: loss/accuracy curves, confusion matrix, weight visualizations, comparison plot

Learning Objectives

  • Derive and implement softmax and its gradient through the cross-entropy and squared losses
  • Understand why different loss functions produce different gradient magnitudes and convergence rates
  • Implement mini-batch stochastic gradient descent from scratch
  • Evaluate a classifier with accuracy, confusion matrix, and per-class metrics

Downloads & Canvas Submission

  • Coursework handout (PDF)
  • Starter package
    • cw2_logistic_regression/losses.py — implement softmax & 4 loss functions
    • cw2_logistic_regression/model.py — implement forward & predict
    • cw2_logistic_regression/optimizer.py — implement gradient descent
    • run.py, trainer.py, config.py — provided, do not modify
    • common/ — shared data loading, metrics, visualization
    • tests/test_cw2.py — numerical gradient checks

Submitting on Canvas: zip cw2_logistic_regression/ (with outputs/ included) together with your report.pdf.

Tasks (What You Implement)

FileWhat to implement
losses.py softmax(logits)
cross_entropy_loss(logits, labels, num_classes)
hinge_loss(logits, labels, num_classes)
exponential_loss(logits, labels, num_classes)
squared_loss(logits, labels, num_classes)
model.py LogisticRegression.forward(X)
LogisticRegression.predict(X)
optimizer.py GradientDescent.__init__(...)
GradientDescent.step(model, grad_logits, X)

Do not modify run.py, config.py, or trainer.py.

Setup

cd code
pip install -r requirements.txt
python setup_data.py        # downloads FashionMNIST as .npy files
cd cw2_logistic_regression

Running Instructions

Quick mode for debugging (10K samples, 10 epochs):

python run.py --quick

Train with a specific loss function:

python run.py --loss_type cross_entropy
python run.py --loss_type hinge
python run.py --loss_type exponential
python run.py --loss_type squared

Compare all four loss functions (generates the comparison plot for your report):

python run.py --compare_all

Verify your implementation with numerical gradient checks:

cd code
python -m tests.test_cw2

What to Observe

After running --compare_all, inspect outputs/loss_comparison.png. You should see:

  • Cross-entropy reaches high training accuracy fastest (largest gradient magnitude at initialization)
  • Squared loss converges slower in early epochs — this is the plateau effect caused by the softmax Jacobian scaling the gradient by ≈pj
  • Hinge converges at a similar rate to cross-entropy, reaching comparable final accuracy
  • Exponential is the slowest to converge and most numerically sensitive
  • Despite different speeds, cross-entropy and squared reach similar final accuracy — they converge to the same linear model capacity ceiling (~84%)

Think About the Differences

Guiding questions for your report

  1. Gradient magnitude: why does the softmax Jacobian make the squared-loss gradient ≈10× smaller than cross-entropy at initialization?
  2. Plateau effect: as training progresses and py → 1, how does the squared loss gradient behave? How is cross-entropy different?
  3. Convergence ceiling: all well-implemented losses eventually converge to similar final accuracy on FashionMNIST — why?
  4. Hinge vs cross-entropy: hinge loss does not pass through softmax. How does this affect its gradient computation and convergence?

Submission Checklist

  • losses.py — all 5 functions implemented and gradient checks pass
  • model.py — forward and predict implemented
  • optimizer.py — gradient descent step implemented
  • outputs/ — contains plots from python run.py --compare_all
  • report.pdf — using the provided template

Grading Rubric (100 points)

ComponentPoints
Softmax + 4 loss functions (losses.py) — correctness verified by gradient check40
Model forward & predict (model.py)20
Gradient descent optimizer (optimizer.py)15
Gradient checks pass (python -m tests.test_cw2)5
Report — loss comparison analysis, convergence discussion, visualizations20
Bonus: L2 regularization (capped at 100 total)+10
Total100

Academic Integrity & Notes

  • You must use NumPy only — no PyTorch, no sklearn, no autograd.
  • Discussing high-level ideas is allowed, but your code must be your own.
  • Do not share or copy implementations.