Form your team and register on the leaderboard by May 29, 23:59
Teams are 1–3 students. Once you register on the leaderboard, the instructor creates your Canvas team accordingly — only then can you upload the final zip on Canvas. Don't wait until June — late registrations may miss the Canvas-team window.
Presentation requirements — read before the last class
All team members speak. Slides are in English; oral delivery is in Chinese or English. The talk is 8 minutes + open Q&A, counts as 10% of the course grade (separate from the project's 100 pts).
!The methodological twist — read this first
At test time, ~30% of examples will have one randomly-selected modality replaced with zeros, indicated by the mask. Training data is clean — consider how to bridge this train/test distribution gap.
A baseline that just concatenates features and runs an MLP will work on intact examples but degrade sharply on dropout examples, because it never saw zero rows during training. What you do about this is the project. Some sensible responses (none required, all in scope): train-time modality dropout simulation, per-modality classifiers with mask-aware late fusion, learned modality gates, or imputation models for the missing modality.
1Downloads & submission
- Coursework handout (PDF) — 6 pages with figures.
- Data + starter bundle (.tar.gz, ~80 MB):
data/— all CSV and NPZ files live here:data/train.csv,data/val.csv,data/unlabeled.csv,data/test.csv— example_id, text_raw, labels or maskdata/{split}_image.npz,data/{split}_text_CLIP.npz,data/{split}_text_BERT.npz— pooled 768-d features
starter/—load_data.py,evaluate.py,baseline_concat_mlp.py,submission_template.csvtests/sanity_checks.py— verifies the bundle and a submission CSVdocs/report_template.tex&docs/report_template.docx— NeurIPS-style report templates
- Leaderboard: http://taohuang.info:3317 — register your team, upload submissions, see your rank. Register by May 29, 23:59 so the instructor can create your Canvas team in time.
- Canvas: a single zip with code,
report.pdf, andslides.pdf— submitted after the leaderboard closes, by Jun 14 23:59. The leaderboard handles the test submission; Canvas handles the write-up. You can only upload the Canvas zip once the instructor has created your Canvas team from your leaderboard registration.
★Keep exploring — after the deadline
The official leaderboard is closed, but if you'd like to keep working on the problem you can still evaluate new ideas — offline or against the hidden test.
- Self-evaluation kit:
selfeval_kit.tar.gz
— a small numpy-only kit that now includes the hidden-test answer key, so you
can reproduce your exact leaderboard score offline (
python score.py --truth test_labels.csv --pred your_submission.csv), with the per-mask-subset breakdown. Also scores againstvaland can build a hidden-test-like local proxy. See the kit'sREADME.md. (The labels are anonymized 18-class vectors — the genre mapping and movie identities stay private.) - Practice sandbox leaderboard: http://taohuang.info:13317 — scores against the same hidden test as the official board, so you can see your real F1. It is a practice sandbox, separate from the official (closed) board; results do not affect any grade. Register with your enrolled student ID and upload a submission CSV.
2Learning objectives
- Reason about multimodal fusion and choose between early, late, and gated fusion architectures based on the task structure.
- Design for a known train/test distribution gap; quantify your method's robustness via stratified ablations.
- Use a small labeled set, a large unlabeled pool, and (optionally) LLM APIs as engineering components — while disclosing all of them.
- Communicate methodology in a NeurIPS-style report and an 8-minute talk + open Q&A.
3The task & data
You receive four splits of the same underlying dataset. Every example has the same
per-modality content: an image-feature vector, two text-feature vectors (CLIP- and
BERT-encoded), and the raw text the features were computed from. Labels are
multi-hot vectors of length 18; classes are anonymized integers
label_0 through label_17.
Splits
| Split | Size | Labels? | Test-time modality dropout? |
|---|---|---|---|
train | 2,000 | yes | no (clean) |
val | 300 | yes | no (clean) |
unlabeled | 5,000 | no | no (clean) |
test | 2,000 | no (graded online) | yes — 30% have one modality zeroed |
CSV schemas
| File | Columns |
|---|---|
data/train.csv, data/val.csv | example_id, text_raw, label_0, …, label_17 |
data/unlabeled.csv | example_id, text_raw |
data/test.csv | example_id, text_raw, has_image, has_text |
Row i of every .npz file for a split corresponds to the
i-th data row of {split}.csv. The CSV's example_id column
is the canonical mapping.
Test-time dropout details
When has_image=0, the row in data/test_image.npz is all zeros. When
has_text=0, BOTH text feature files have zero rows there and
text_raw is the empty string. The mask is provided so you can identify these
examples; the zeroing is already applied.
4Setup & first run
tar -xzf student_bundle.tar.gz
cd student_bundle/
pip install numpy torch tqdm
python -m starter.baseline_concat_mlp # ~1 minute, writes submission.csv
python -m tests.sanity_checks --submission submission.csv
The baseline is the floor you must beat — it is the simplest concat-MLP and does not know about test-time modality dropout. The remaining four weeks are about beating it.
5Compute — SJTU HPC (交我算)
A teaching HPC account on SJTU 交我算 has been requested for every student in the course. Using it is optional, not required — the project is small enough to run on a laptop, but if you'd like a GPU, this is your option.
Quick links
- Login portal: https://my.hpc.sjtu.edu.cn/
- Documentation: https://docs.hpc.sjtu.edu.cn/
Your account
The instructor has sent each student a personal username and password via Canvas 私信 (direct message). If you did not receive the message, email the instructor at t.huang@sjtu.edu.cn.
Teaching-account quotas & rules (from the HPC operations team)
为避免计算资源浪费,教学账号限制同时运行的作业数量 1 个、CPU 节点数
1 个、GPU 卡数 1 卡、作业最长运行时间 12 小时。
请优先使用 π2.0 集群 完成作业。登录节点为 pilogin,
请务必提醒学生 不要在登录节点运行作业,否则将会被封禁。
CPU 作业请使用 cpu 队列,GPU 作业请使用 dgx2 队列
(单卡拥有 32 G 显存)。目前集群 GPU 资源紧张,可能会出现排队的现象,
请学生妥善安排作业提交时间。
English summary. To avoid wasting shared resources, each teaching account is
capped at 1 concurrent job, 1 CPU node, 1 GPU card, and a 12-hour wall-clock
limit per job. Prefer the π2.0 cluster. The login node is
pilogin — do NOT run jobs on the login node, or your account will
be banned. Use the cpu queue for CPU work and the dgx2
queue (32 GB per card) for GPU work. GPU resources are currently tight; plan your
job submissions accordingly.
6Submission format & leaderboard
A CSV with exactly this header and 2,000 data rows (one per test example, any order):
example_id,label_0,label_1,...,label_17
ex_00123,0.81,0.04,...,0.03
...
- Values may be probabilities in
[0,1]or binaries in{0,1}; the leaderboard thresholds probabilities at 0.5 by default. - F1-macro is the leaderboard metric. F1-micro is reported but does not affect rank.
- Each team is anonymized on the public leaderboard — only your group nickname is shown.
- 5 submissions per day per team, 50 total over the four weeks. Format-error submissions that fail to score don't count against your quota.
- Scoring is immediate. Rank by best F1-macro; ties broken by whoever reached the score first.
7Timeline
report.pdf, and
slides.pdf.
8Hard constraints
- Total model parameters ≤ 100M at inference time, including any pretrained weights loaded into memory. This rules out fine-tuning a large pretrained backbone; it does not rule out using one via API (see below).
- No external datasets beyond what's provided in the bundle.
- No attempts to deanonymize the source domain. Detected attempts are an automatic fail. The features are pre-computed; you do not need to know the source.
- LLM API calls are allowed (paid or free), but must be disclosed in your report: which provider/model, where in your pipeline, approximate cost (CNY). Pure prompting alone will be marked low on Method Design.
9What to put in the report
Max 8 pages, NeurIPS-style. Use docs/report_template.tex or
docs/report_template.docx. The grading rubric weighs each section explicitly
— the items below are required.
- Method Choice Rationale (one paragraph): why this method, what was rejected and why.
- Approach: pipeline figure; how you handle missing modalities; how you used the unlabeled pool (if at all); LLM/API usage disclosure.
- Empirical Analysis: per-modality ablation (image only / text only / both) and per-subset F1 on test stratified by the mask (intact / image-dropped / text-dropped).
- Resources Used: compute hours, hardware, API providers/models, approximate cost in CNY.
- AI usage: itemize every AI tool and exactly what each accomplished (see Academic Integrity below).
10Grading rubric (100 points)
The 8-minute presentation + Q&A on the last class is graded separately as the 10% course-level presentation component — it is not part of these 100 project points.
11Submission checklist
- Registered your team on the leaderboard (one account per team).
sanity_checks.pypasses for both the bundle and your final submission CSV.- Final submission uploaded to the leaderboard before Jun 12, 20:00.
-
Single zip uploaded to Canvas before Jun 14, 23:59 containing:
code/with a runnableREADMEand pinned random seedsreport.pdf— max 8 pages, all required sections including AI usageslides.pdf— presentation slides
!Academic integrity
- Discussing high-level ideas with other teams is allowed. Sharing code, submissions, model checkpoints, or labels is not.
- Your final submission must come from your own implementation.
- Any attempt to deanonymize the source domain or recover test labels (e.g. reverse-searching the raw text on the web for the purpose of identification) is a violation and an automatic fail for the project.
AI usage disclosure (required)
AI tools — LLM APIs in your pipeline, coding assistants (Cursor, Claude Code, GitHub Copilot, …), report-writing assistants, autonomous coding agents — are permitted. They are not a free pass on accountability.
In your report, include an "AI usage" subsection that itemizes every tool you used and exactly what each accomplished. Be specific. Examples of adequate disclosure:
- "Cursor wrote the data-loading boilerplate in
load.pyand the argparse plumbing intrain.py; we wrote the model and training loop." - "Claude API: one chat session to debug a numpy shape mismatch in our training loop. ~3 turns, ~CNY 0.20."
- "GPT-5 helped revise wording in the abstract and the discussion; we wrote the methodology and analysis sections."
Examples of inadequate disclosure: "I used ChatGPT", "some parts were AI-assisted". If a Q&A question reveals that you cannot explain a component you claimed as your own work, that is a violation.
The deliverable rule: the design decisions must be yours, and you must be able to defend every part of your submission in person.