A-Level Physics CIE (9702) Answer Template Generator
Generate structured answer templates for Cambridge International A-Level Physics (9702) questions using a fine-tuned Qwen3-4B model with LoRA adapters trained on 1652 real past papers.
Skill contract (runtime vs optional tooling)
This section clarifies what runs for normal skill use vs what exists only for dataset rebuild / retraining, so automated reviewers (e.g. OpenClaw) and humans can align expectations with the code.
| Primary path — inference | Optional — training / data pipeline |
|---|
| Entrypoints | skill/scripts/inference.py, generate_template / generate_template_verified in that module | scraper/*, scripts/build_sft.py, scripts/run_full_pipeline.py, scripts/train.sh, etc. |
| Remote APIs | None for generation | DeepSeek API when --teacher deepseek or full pipeline teacher mode (DEEPSEEK_API_KEY) |
| Web / HTTP | Hugging Face (typical) to download base model Qwen/Qwen3-4B-MLX-4bit on first run; no user question leaves your machine as HTTP payload | cie.fraft.org when running the scraper; HF again for training stack as configured |
| Secrets | No DEEPSEEK_API_KEY required by inference | DEEPSEEK_API_KEY only if you regenerate SFT via DeepSeek |
Inference does not scrape past papers, does not call DeepSeek, and does not exfiltrate prompts to a third-party LLM API. Maintainer scripts may; they are separate.
Full detail: SECURITY.md in the repository root.
Mandatory rule for the orchestrator (plain-text math)
When you produce any final answer, template, or paraphrase for the user—whether you ran skill/scripts/inference.py or answered from general knowledge—you must:
- Write formulae in plain text (e.g.
v² = u² + 2as, E = hf, λ = h/p, P = IV).
- Never wrap math in
$...$, $$...$$, \(...\), \[...\], or similar TeX delimiters. Raw $$ is unreadable for users in Clawhub/OpenClaw-style clients.
- If tool output still contains stray
$ signs, strip or rewrite those segments into plain text before showing them to the user.
Local inference already applies the same rule via its system prompt and post-processing; the orchestrator must follow it even when not calling the script.
Quick Start
Run inference on a physics question:
python skill/scripts/inference.py "Define specific heat capacity."
Or in Python:
from skill.scripts.inference import generate_template
result = generate_template("Calculate the maximum height reached by a ball thrown upward at 20 m/s.")
print(result)
Output Format
The model produces structured answer templates:
- Question type — calculation / definition / explain / describe / derive / analyse / practical
- Given — quantities and conditions from the question
- Required — what the student must find or state
- Formulae / principles — relevant equations and physics laws
- Answer frame — numbered step-by-step approach
- Check — unit/sign/direction/significant-figure verification
Display note (Clawhub / chat clients — applies to orchestrator and model): Present equations in plain text (ASCII and Unicode, e.g. v², λ, ×, fractions with /). Do not use LaTeX delimiters ($, $$, \(…\), \[…\]) in final user-facing output — many clients do not render math, so those tokens look garbled. The inference script enforces this with a system prompt and post-processing when you run it; if you answer without the script, you must still follow this rule.
Model Details
- Base model:
Qwen/Qwen3-4B-MLX-4bit
- Adapter: LoRA rank 8, 16 layers, trained 1000 iterations
- Training data: 414 question–template pairs from 9702 Papers 2/4/5 (2001–2025), templates generated by DeepSeek with mark-scheme context
- Peak memory: 4 GB (runs on any 8GB+ Apple Silicon Mac)
Retraining
To retrain or extend with more data:
python scripts/run_full_pipeline.py --teacher deepseek
See skill/references/training.md for the full pipeline details.
Adversarial Robustness Evaluation
Test the model's robustness using three physics-adapted attack strategies from Xie et al. (2024):
python skill/scripts/adversarial_eval.py
python skill/scripts/adversarial_eval.py --strategies numeric --variants 5 --max-questions 10
Reports OA (Original Accuracy), AA (Adversarial Accuracy), and ASR (Attack Success Rate) per strategy.
References
skill/references/training.md — Full scraping, extraction, SFT, and training pipeline
skill/references/answer_template_format.md — Detailed output format specification
skill/scripts/inference.py — Standalone inference script
skill/scripts/adversarial_eval.py — Adversarial robustness evaluation (numeric perturbation, context swap, question-type adversarial)
SECURITY.md — Network, secrets, and trust boundaries