Coursework 2: Logistic Regression & Loss Functions

Overview

You will build every component of a logistic regression pipeline by hand — softmax, loss functions, gradient computation, and parameter updates — using NumPy only (no PyTorch).

Dataset: FashionMNIST — 60K train / 10K test, 28×28 grayscale images of 10 clothing categories, flattened to 784-dim vectors
Model: p = softmax(XW + b) — a single linear layer + softmax
Loss functions: cross-entropy, hinge, exponential, squared
Outputs: loss/accuracy curves, confusion matrix, weight visualizations, comparison plot

Learning Objectives

Derive and implement softmax and its gradient through the cross-entropy and squared losses
Understand why different loss functions produce different gradient magnitudes and convergence rates
Implement mini-batch stochastic gradient descent from scratch
Evaluate a classifier with accuracy, confusion matrix, and per-class metrics

Downloads & Canvas Submission

Coursework handout (PDF)
Starter package
- cw2_logistic_regression/losses.py — implement softmax & 4 loss functions
- cw2_logistic_regression/model.py — implement forward & predict
- cw2_logistic_regression/optimizer.py — implement gradient descent
- run.py, trainer.py, config.py — provided, do not modify
- common/ — shared data loading, metrics, visualization
- tests/test_cw2.py — numerical gradient checks

Submitting on Canvas: zip cw2_logistic_regression/ (with outputs/ included) together with your report.pdf.

Tasks (What You Implement)

File	What to implement
`losses.py`	`softmax(logits)` `cross_entropy_loss(logits, labels, num_classes)` `hinge_loss(logits, labels, num_classes)` `exponential_loss(logits, labels, num_classes)` `squared_loss(logits, labels, num_classes)`
`model.py`	`LogisticRegression.forward(X)` `LogisticRegression.predict(X)`
`optimizer.py`	`GradientDescent.__init__(...)` `GradientDescent.step(model, grad_logits, X)`

Do not modify run.py, config.py, or trainer.py.

Setup

cd code
pip install -r requirements.txt
python setup_data.py        # downloads FashionMNIST as .npy files
cd cw2_logistic_regression

Running Instructions

Quick mode for debugging (10K samples, 10 epochs):

python run.py --quick

Train with a specific loss function:

python run.py --loss_type cross_entropy
python run.py --loss_type hinge
python run.py --loss_type exponential
python run.py --loss_type squared

Compare all four loss functions (generates the comparison plot for your report):

python run.py --compare_all

Verify your implementation with numerical gradient checks:

cd code
python -m tests.test_cw2

What to Observe

After running --compare_all, inspect outputs/loss_comparison.png. You should see:

Cross-entropy reaches high training accuracy fastest (largest gradient magnitude at initialization)
Squared loss converges slower in early epochs — this is the plateau effect caused by the softmax Jacobian scaling the gradient by ≈p_j
Hinge converges at a similar rate to cross-entropy, reaching comparable final accuracy
Exponential is the slowest to converge and most numerically sensitive
Despite different speeds, cross-entropy and squared reach similar final accuracy — they converge to the same linear model capacity ceiling (~84%)

Think About the Differences

Guiding questions for your report

Gradient magnitude: why does the softmax Jacobian make the squared-loss gradient ≈10× smaller than cross-entropy at initialization?
Plateau effect: as training progresses and p_y → 1, how does the squared loss gradient behave? How is cross-entropy different?
Convergence ceiling: all well-implemented losses eventually converge to similar final accuracy on FashionMNIST — why?
Hinge vs cross-entropy: hinge loss does not pass through softmax. How does this affect its gradient computation and convergence?

Submission Checklist

losses.py — all 5 functions implemented and gradient checks pass
model.py — forward and predict implemented
optimizer.py — gradient descent step implemented
outputs/ — contains plots from python run.py --compare_all
report.pdf — using the provided template

Grading Rubric (100 points)

Component	Points
Softmax + 4 loss functions (`losses.py`) — correctness verified by gradient check	40
Model forward & predict (`model.py`)	20
Gradient descent optimizer (`optimizer.py`)	15
Gradient checks pass (`python -m tests.test_cw2`)	5
Report — loss comparison analysis, convergence discussion, visualizations	20
Bonus: L2 regularization (capped at 100 total)	+10
Total	100

Academic Integrity & Notes

You must use NumPy only — no PyTorch, no sklearn, no autograd.
Discussing high-level ideas is allowed, but your code must be your own.
Do not share or copy implementations.