Coursework 4: Convolutional Neural Networks with PyTorch

The Progression: CW2 → CW3 → CW4

Coursework	Model	Framework	~Accuracy
CW2	Logistic Regression (784→10)	NumPy, manual gradients	~84%
CW3	MLP (784→128→64→10)	NumPy, manual backprop	~88–90%
CW4	CNN (Conv→Pool→…→FC)	PyTorch autograd	~91–93%

The key new idea: convolutional layers exploit spatial structure in images — they share weights across positions, making the model translation-equivariant and far more parameter-efficient than a fully connected layer over raw pixels.

Learning Objectives

Design CNN architectures using nn.Conv2d, nn.MaxPool2d, nn.BatchNorm2d, nn.Dropout
Understand why convolutional layers outperform fully connected layers on image data
Experiment with data augmentation and observe its regularization effect
Compare Adam and SGD optimizers and understand the trade-offs
Analyse the full CW2 → CW3 → CW4 accuracy progression

Downloads & Canvas Submission

Coursework handout (PDF)
Starter package
- cw4_cnn/model.py — implement SimpleCNN (required) and DeepCNN (bonus)
- run.py, trainer.py, config.py, dataset.py — provided, do not modify
- tests/test_cw4.py — shape and gradient-flow checks
- common/ — shared data loading & visualization utilities

Submitting on Canvas: zip cw4_cnn/ (with outputs/ included) together with your report.pdf.

Tasks (What You Implement)

All implementation is in model.py:

Class / Method	Description
`SimpleCNN.__init__`	Define your CNN layers using `nn.Sequential` or individual attributes. Input: `(B, 1, 28, 28)` — Output: `(B, 10)` Suggested architecture: Conv(1→32)→ReLU→Pool → Conv(32→64)→ReLU→Pool → Flatten → Linear(3136→128)→ReLU → Linear(128→10)
`SimpleCNN.forward`	Pass input through your layers. Do not apply softmax — `nn.CrossEntropyLoss` includes it.
`DeepCNN.__init__` & `forward`	(Bonus) A deeper architecture targeting ≥92% test accuracy. Suggestions: 3–5 conv blocks, BatchNorm2d, Dropout, global average pooling.

Useful PyTorch layers

Layer	Typical use
`nn.Conv2d(in_ch, out_ch, kernel_size, padding=1)`	Feature extraction
`nn.MaxPool2d(2)`	Spatial downsampling ÷2
`nn.BatchNorm2d(num_features)`	Stabilizes training
`nn.Dropout(p)`	Regularization
`nn.Flatten()`	Flatten spatial dims before FC
`nn.Linear(in, out)`	Fully connected classifier
`nn.AdaptiveAvgPool2d(1)`	Global average pooling (DeepCNN)

Do not modify run.py, config.py, dataset.py, or trainer.py.

Setup

cd code
pip install -r requirements.txt
python setup_data.py        # only needed if not done in CW2/CW3
cd cw4_cnn

Running Instructions

Verify architecture (shape + gradient-flow checks):

cd code
python -m tests.test_cw4

Quick mode for debugging:

python run.py --quick

Train SimpleCNN with default settings (Adam, 15 epochs):

python run.py

Train with data augmentation:

python run.py --use_augmentation

Compare optimizers (for report):

python run.py --optimizer_type adam
python run.py --optimizer_type sgd --learning_rate 0.01

Train DeepCNN (bonus):

python run.py --model_type deep_cnn --num_epochs 20

What to Observe

After training, check the outputs in outputs/. You should see:

CNN vs MLP: SimpleCNN (~91%) clearly outperforms the CW3 MLP (~88–90%) despite having fewer parameters — because convolutions exploit spatial structure
Data augmentation: random flips and crops reduce overfitting; the gap between training and validation accuracy narrows
Adam vs SGD: Adam converges faster in early epochs; well-tuned SGD with momentum can match Adam at convergence but is more sensitive to learning rate
Weight visualizations: the first conv layer learns edge and texture detectors — visually interpretable filters

Think About the Differences

Guiding questions for your report

Parameter efficiency: a Conv2d(1, 32, 3) layer has only 32×(3×3+1)=320 parameters, but applies them to every position in a 28×28 image. How does this compare to a fully connected layer over the same input?
Spatial hierarchy: after two MaxPool2d(2) layers, the spatial dimensions become 7×7. What does this imply about the receptive field of later layers?
BatchNorm: why does adding BatchNorm2d after conv layers help training? What does it do to the gradient signal?
Augmentation: data augmentation increases effective dataset size without collecting new data. Under what conditions does it help most?
Full progression: write a short explanation for why each step — Logistic Regression → MLP → CNN — improves accuracy on FashionMNIST.

Submission Checklist

model.py — SimpleCNN init & forward implemented
Shape checks pass: python -m tests.test_cw4
outputs/ — contains training plots from SimpleCNN experiments
report.pdf — includes architecture diagram, optimizer comparison, augmentation analysis, and CW2→CW3→CW4 progression

Grading Rubric (100 points)

Component	Points
SimpleCNN architecture (`__init__`) — shape check passes, reasonable design	30
SimpleCNN forward pass — correct data flow, no softmax applied	20
Data augmentation experiment — comparison with/without augmentation	15
Optimizer comparison (Adam vs SGD) — analysis of convergence speed and stability	15
Report — architecture description, results discussion, CW2→CW4 progression analysis	20
Bonus: DeepCNN achieving ≥ 92% test accuracy (capped at 100 total)	+10
Total	100

Academic Integrity & Notes

You may use any torch.nn modules — no need to implement convolutions manually.
Discussing high-level architecture ideas is allowed, but your code must be your own.
Do not copy architectures from external sources without understanding them.