| Coursework | Model | Framework | ~Accuracy |
|---|---|---|---|
| CW2 | Logistic Regression (784→10) | NumPy, manual gradients | ~84% |
| CW3 | MLP (784→128→64→10) | NumPy, manual backprop | ~88–90% |
| CW4 | CNN (Conv→Pool→…→FC) | PyTorch autograd | ~91–93% |
The key new idea: convolutional layers exploit spatial structure in images — they share weights across positions, making the model translation-equivariant and far more parameter-efficient than a fully connected layer over raw pixels.
nn.Conv2d, nn.MaxPool2d, nn.BatchNorm2d, nn.Dropoutcw4_cnn/model.py — implement SimpleCNN (required) and DeepCNN (bonus)run.py, trainer.py, config.py, dataset.py — provided, do not modifytests/test_cw4.py — shape and gradient-flow checkscommon/ — shared data loading & visualization utilities
Submitting on Canvas: zip cw4_cnn/ (with outputs/ included) together with your report.pdf.
All implementation is in model.py:
| Class / Method | Description |
|---|---|
SimpleCNN.__init__ |
Define your CNN layers using nn.Sequential or individual attributes.Input: (B, 1, 28, 28) — Output: (B, 10)Suggested architecture: Conv(1→32)→ReLU→Pool → Conv(32→64)→ReLU→Pool → Flatten → Linear(3136→128)→ReLU → Linear(128→10) |
SimpleCNN.forward |
Pass input through your layers. Do not apply softmax — nn.CrossEntropyLoss includes it.
|
DeepCNN.__init__ & forward |
(Bonus) A deeper architecture targeting ≥92% test accuracy. Suggestions: 3–5 conv blocks, BatchNorm2d, Dropout, global average pooling. |
| Layer | Typical use |
|---|---|
nn.Conv2d(in_ch, out_ch, kernel_size, padding=1) | Feature extraction |
nn.MaxPool2d(2) | Spatial downsampling ÷2 |
nn.BatchNorm2d(num_features) | Stabilizes training |
nn.Dropout(p) | Regularization |
nn.Flatten() | Flatten spatial dims before FC |
nn.Linear(in, out) | Fully connected classifier |
nn.AdaptiveAvgPool2d(1) | Global average pooling (DeepCNN) |
Do not modify run.py, config.py, dataset.py, or trainer.py.
cd code
pip install -r requirements.txt
python setup_data.py # only needed if not done in CW2/CW3
cd cw4_cnn
Verify architecture (shape + gradient-flow checks):
cd code
python -m tests.test_cw4
Quick mode for debugging:
python run.py --quick
Train SimpleCNN with default settings (Adam, 15 epochs):
python run.py
Train with data augmentation:
python run.py --use_augmentation
Compare optimizers (for report):
python run.py --optimizer_type adam
python run.py --optimizer_type sgd --learning_rate 0.01
Train DeepCNN (bonus):
python run.py --model_type deep_cnn --num_epochs 20
After training, check the outputs in outputs/. You should see:
model.py — SimpleCNN init & forward implementedpython -m tests.test_cw4outputs/ — contains training plots from SimpleCNN experimentsreport.pdf — includes architecture diagram, optimizer comparison, augmentation analysis, and CW2→CW3→CW4 progression| Component | Points |
|---|---|
SimpleCNN architecture (__init__) — shape check passes, reasonable design | 30 |
| SimpleCNN forward pass — correct data flow, no softmax applied | 20 |
| Data augmentation experiment — comparison with/without augmentation | 15 |
| Optimizer comparison (Adam vs SGD) — analysis of convergence speed and stability | 15 |
| Report — architecture description, results discussion, CW2→CW4 progression analysis | 20 |
| Bonus: DeepCNN achieving ≥ 92% test accuracy (capped at 100 total) | +10 |
| Total | 100 |
torch.nn modules — no need to implement convolutions manually.