Install
openclaw skills install code-refactor-for-reproducibilityUse when refactoring research code for publication, adding documentation to existing analysis scripts, creating reproducible computational workflows, or prep...
openclaw skills install code-refactor-for-reproducibilityFollow this sequence when refactoring a research codebase:
Read each source file and check for the following problems. Document findings before making any changes.
Checklist: missing docstrings · hardcoded absolute paths · missing random seeds · bare except: clauses · unpinned imports · unexplained magic numbers
Example — detecting issues manually:
import ast, pathlib
def find_hardcoded_paths(source: str) -> list[str]:
"""Return string literals that look like absolute paths."""
tree = ast.parse(source)
return [
node.s for node in ast.walk(tree)
if isinstance(node, ast.Constant)
and isinstance(node.s, str)
and node.s.startswith("/")
]
source = pathlib.Path("analysis.py").read_text()
print(find_hardcoded_paths(source))
Apply improvements in place. Always back up originals first.
# Before
def load_data(path):
import pandas as pd
return pd.read_csv(path)
# After
def load_data(path: str) -> "pd.DataFrame":
"""Load a CSV dataset from disk.
Parameters
----------
path : str
Path to the CSV file (relative to project root).
Returns
-------
pd.DataFrame
Raw dataset with original column names preserved.
"""
import pandas as pd
return pd.read_csv(path)
from pathlib import Path
import argparse
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--data", type=Path, default=Path("data/raw.csv"))
parser.add_argument("--output", type=Path, default=Path("results/"))
return parser.parse_args()
args = parse_args()
df = pd.read_csv(args.data)
args.output.mkdir(parents=True, exist_ok=True)
import random
import numpy as np
SEED = 42 # document this constant at module level
random.seed(SEED)
np.random.seed(SEED)
# scikit-learn
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=SEED)
# PyTorch
import torch
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
import logging
from pathlib import Path
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger(__name__)
def load_data(path: Path) -> "pd.DataFrame":
"""Load dataset with validation."""
import pandas as pd
if not path.exists():
raise FileNotFoundError(f"Data file not found: {path}")
logger.info("Loading data from %s", path)
df = pd.read_csv(path)
if df.empty:
raise ValueError(f"Loaded dataframe is empty: {path}")
logger.info("Loaded %d rows, %d columns", *df.shape)
return df
See references/environment-setup.md for full Dockerfile and Conda environment templates.
pip install pipreqs
pipreqs src/ --output requirements.txt --force
Verify resolution:
python -m venv .venv_test && source .venv_test/bin/activate
pip install -r requirements.txt
python -c "import pandas, numpy, sklearn"
deactivate && rm -rf .venv_test
name: my-research-env
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- numpy=1.24.3
- pandas=2.0.1
- scikit-learn=1.2.2
- matplotlib=3.7.1
- pip:
- some-pip-only-package==0.5.0
conda env create -f environment.yml
conda activate my-research-env
Generate a README.md containing at minimum:
## Requirements
<!-- List Python version and key packages with versions -->
## Installation
```bash
conda env create -f environment.yml
conda activate my-research-env
python main.py --data data/raw.csv --output results/
config.py)
---
## Step 5: Validate Reproducibility
After all changes, verify that behaviour is unchanged:
```bash
# 1. Run the full pipeline and capture output checksums
python main.py --data data/raw.csv --output results/
md5sum results/*.csv > checksums_refactored.md5
diff checksums_original.md5 checksums_refactored.md5
# 2. Run unit tests
pytest tests/ -v --tb=short
# 3. Confirm determinism across two clean runs
python main.py --output results_run1/
python main.py --output results_run2/
diff -r results_run1/ results_run2/
Reproducibility verification checklist:
requirements.txt / environment.yml installs cleanly in a fresh environment| Practice |
|---|
| Relative paths only |
| Pin dependency versions |
| Set random seeds |
| Docstrings on all public functions |
| Validate outputs against a baseline |
| Automate environment setup |
references/guide.md — Comprehensive user guidereferences/environment-setup.md — Dockerfile and full environment templatesreferences/examples/ — Working code examplesreferences/api-docs/ — Complete API documentationSkill ID: 455 | Version: 1.0 | License: MIT