This skill should be used when the user asks to "reproduce a paper", "implement paper methods in Python", "extract paper content to Markdown", or works on pa...
Follow this workflow end-to-end unless the user explicitly asks to skip steps
1) Intake and scope
Confirm input artifacts: TeX source path(s), PDF path, supplementary files, target repository, and expected outputs.
State assumptions explicitly when information is missing.
Keep approach adaptable to the specific paper; do not force a fixed dependency stack or rigid project template.
Check whether the working folder already contains paper source files (.tex, .bib, style files, figures).
Check whether the working folder contains user-preprocessed documents (.md, .json, images such as .png, .jpg, .svg).
Source priority rule (read in order, stop when sufficient):
TeX sources (preferred): If usable TeX source files (.tex, .bib, style files) are present, use them as the primary source.
User-preprocessed documents (secondary): If TeX is absent or incomplete, read user-provided documents (.md, .json) and images (.png, .jpg, .svg) that may contain pre-extracted paper content.
PDF fallback (last resort): Only when both TeX and user-preprocessed documents are unavailable or insufficient, fall back to PDF extraction.
Include image references/placeholders when figures cannot be represented as plain text.
Produce one primary output file such as paper_fulltext.md.
3) Extraction quality checks
Validate completeness before moving to reproduction:
Section/headings coverage matches the TeX project or PDF source used.
Key equations and algorithm blocks are present.
References and appendices are included if present in the source.
Report known extraction limitations and exact affected files/pages/segments.
4) Reproduction planning (paper-specific)
Build a reproduction plan from the extracted source materials (TeX-derived notes and/or Markdown), not from memory.
Identify:
Problem definition, notation, assumptions, and objective functions.
Algorithm steps and required components.
Dataset generation/loading, training/optimization, and evaluation protocol.
Baselines and ablations required for faithful reproduction.
If details are missing or ambiguous, call out the gap and provide a conservative implementation choice with rationale.
5) Python implementation principles
Implement with modular design and clear boundaries:
Separate concerns (data, models/algorithms, training/solver loop, evaluation, utils, config).
Prefer low coupling and high cohesion.
Avoid monolithic scripts:
Split code into modules whenever responsibilities can be separated.
Prefer one clear responsibility per file.
File size guideline:
Keep a single source file under ~200 lines whenever practical.
If a file grows beyond ~200 lines, refactor into submodules unless there is a clear reason not to.
Keep dependencies minimal and paper-driven; choose tools based on the paper's actual needs.
Avoid over-engineering early; start from the minimum reproducible core, then extend.
Add tests/checks for critical math or pipeline steps where feasible.
Preserve reproducibility:
deterministic seeds when applicable,
explicit config for key hyperparameters,
clear experiment entry points.
6) Logging and data persistence
All reproduction experiments must save execution logs and output data:
Logs: Save console output, training progress, and error messages to timestamped log files (e.g., logs/experiment_YYYYMMDD_HHMMSS.log).
Data: Save all generated data (metrics, model checkpoints, intermediate results) to structured files (e.g., JSON, CSV, or pickle) in a dedicated outputs/ or results/ directory.
Configuration snapshot: Save the exact configuration/hyperparameters used for each run alongside the outputs.
Logging should capture enough detail to:
Trace the execution flow for debugging.
Compare results across multiple runs.
Identify the exact conditions under which results were produced.
7) Result verification and comparison
After running reproduction experiments, compare results against the paper's reported data:
Extract quantitative metrics (tables, figures) from the paper source.
Compute the same metrics from reproduction outputs.
Document both paper-reported values and reproduction values side by side.
Identify discrepancies:
If reproduction results deviate significantly from paper results, investigate potential causes: