Coursework 4: Convolutional Neural Networks with PyTorch

CS3317: Artificial Intelligence Canvas CNN PyTorch FashionMNIST
Design and train convolutional neural networks using PyTorch. In CW2 & CW3 you built everything by hand; here PyTorch's autograd handles backpropagation automatically — your focus is on architecture design and experimental analysis.

The Progression: CW2 → CW3 → CW4

CourseworkModelFramework~Accuracy
CW2Logistic Regression (784→10)NumPy, manual gradients~84%
CW3MLP (784→128→64→10)NumPy, manual backprop~88–90%
CW4CNN (Conv→Pool→…→FC)PyTorch autograd~91–93%

The key new idea: convolutional layers exploit spatial structure in images — they share weights across positions, making the model translation-equivariant and far more parameter-efficient than a fully connected layer over raw pixels.

Learning Objectives

  • Design CNN architectures using nn.Conv2d, nn.MaxPool2d, nn.BatchNorm2d, nn.Dropout
  • Understand why convolutional layers outperform fully connected layers on image data
  • Experiment with data augmentation and observe its regularization effect
  • Compare Adam and SGD optimizers and understand the trade-offs
  • Analyse the full CW2 → CW3 → CW4 accuracy progression

Downloads & Canvas Submission

  • Coursework handout (PDF)
  • Starter package
    • cw4_cnn/model.py — implement SimpleCNN (required) and DeepCNN (bonus)
    • run.py, trainer.py, config.py, dataset.py — provided, do not modify
    • tests/test_cw4.py — shape and gradient-flow checks
    • common/ — shared data loading & visualization utilities

Submitting on Canvas: zip cw4_cnn/ (with outputs/ included) together with your report.pdf.

Tasks (What You Implement)

All implementation is in model.py:

Class / MethodDescription
SimpleCNN.__init__ Define your CNN layers using nn.Sequential or individual attributes.
Input: (B, 1, 28, 28) — Output: (B, 10)
Suggested architecture: Conv(1→32)→ReLU→Pool → Conv(32→64)→ReLU→Pool → Flatten → Linear(3136→128)→ReLU → Linear(128→10)
SimpleCNN.forward Pass input through your layers. Do not apply softmax — nn.CrossEntropyLoss includes it.
DeepCNN.__init__ & forward (Bonus) A deeper architecture targeting ≥92% test accuracy.
Suggestions: 3–5 conv blocks, BatchNorm2d, Dropout, global average pooling.

Useful PyTorch layers

LayerTypical use
nn.Conv2d(in_ch, out_ch, kernel_size, padding=1)Feature extraction
nn.MaxPool2d(2)Spatial downsampling ÷2
nn.BatchNorm2d(num_features)Stabilizes training
nn.Dropout(p)Regularization
nn.Flatten()Flatten spatial dims before FC
nn.Linear(in, out)Fully connected classifier
nn.AdaptiveAvgPool2d(1)Global average pooling (DeepCNN)

Do not modify run.py, config.py, dataset.py, or trainer.py.

Setup

cd code
pip install -r requirements.txt
python setup_data.py        # only needed if not done in CW2/CW3
cd cw4_cnn

Running Instructions

Verify architecture (shape + gradient-flow checks):

cd code
python -m tests.test_cw4

Quick mode for debugging:

python run.py --quick

Train SimpleCNN with default settings (Adam, 15 epochs):

python run.py

Train with data augmentation:

python run.py --use_augmentation

Compare optimizers (for report):

python run.py --optimizer_type adam
python run.py --optimizer_type sgd --learning_rate 0.01

Train DeepCNN (bonus):

python run.py --model_type deep_cnn --num_epochs 20

What to Observe

After training, check the outputs in outputs/. You should see:

  • CNN vs MLP: SimpleCNN (~91%) clearly outperforms the CW3 MLP (~88–90%) despite having fewer parameters — because convolutions exploit spatial structure
  • Data augmentation: random flips and crops reduce overfitting; the gap between training and validation accuracy narrows
  • Adam vs SGD: Adam converges faster in early epochs; well-tuned SGD with momentum can match Adam at convergence but is more sensitive to learning rate
  • Weight visualizations: the first conv layer learns edge and texture detectors — visually interpretable filters

Think About the Differences

Guiding questions for your report

  1. Parameter efficiency: a Conv2d(1, 32, 3) layer has only 32×(3×3+1)=320 parameters, but applies them to every position in a 28×28 image. How does this compare to a fully connected layer over the same input?
  2. Spatial hierarchy: after two MaxPool2d(2) layers, the spatial dimensions become 7×7. What does this imply about the receptive field of later layers?
  3. BatchNorm: why does adding BatchNorm2d after conv layers help training? What does it do to the gradient signal?
  4. Augmentation: data augmentation increases effective dataset size without collecting new data. Under what conditions does it help most?
  5. Full progression: write a short explanation for why each step — Logistic Regression → MLP → CNN — improves accuracy on FashionMNIST.

Submission Checklist

  • model.py — SimpleCNN init & forward implemented
  • Shape checks pass: python -m tests.test_cw4
  • outputs/ — contains training plots from SimpleCNN experiments
  • report.pdf — includes architecture diagram, optimizer comparison, augmentation analysis, and CW2→CW3→CW4 progression

Grading Rubric (100 points)

ComponentPoints
SimpleCNN architecture (__init__) — shape check passes, reasonable design30
SimpleCNN forward pass — correct data flow, no softmax applied20
Data augmentation experiment — comparison with/without augmentation15
Optimizer comparison (Adam vs SGD) — analysis of convergence speed and stability15
Report — architecture description, results discussion, CW2→CW4 progression analysis20
Bonus: DeepCNN achieving ≥ 92% test accuracy (capped at 100 total)+10
Total100

Academic Integrity & Notes

  • You may use any torch.nn modules — no need to implement convolutions manually.
  • Discussing high-level architecture ideas is allowed, but your code must be your own.
  • Do not copy architectures from external sources without understanding them.