Install
openclaw skills install code-refactor-for-reproducibility-1Use when refactoring research code for publication, adding documentation to existing analysis scripts, creating reproducible computational workflows, or preparing code for sharing with collaborators. Transforms research code into publication-ready, reproducible workflows. Adds documentation, implements error handling, creates environment specifications, and ensures computational reproducibility for scientific publications.
openclaw skills install code-refactor-for-reproducibility-1scripts/main.py.Python: 3.10+. Repository baseline for current packaged skills.numpy: unspecified. Declared in requirements.txt.pandas: unspecified. Declared in requirements.txt.pytest: unspecified. Declared in requirements.txt.scipy: unspecified. Declared in requirements.txt.src: unspecified. Declared in requirements.txt.cd "20260318/scientific-skills/Data Analytics/code-refactor-for-reproducibility"
python -m py_compile scripts/main.py
python scripts/main.py --help
Example run plan:
CONFIG block or documented parameters if the script uses fixed settings.python scripts/main.py with the validated inputs.See ## Workflow above for related details.
scripts/main.py.Use this command to verify that the packaged script entry point can be parsed before deeper execution.
python -m py_compile scripts/main.py
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
python -m py_compile scripts/main.py
python scripts/main.py --help
Follow this sequence when refactoring a research codebase:
Read each source file and check for the following problems. Document findings before making any changes.
Checklist: missing docstrings · hardcoded absolute paths · missing random seeds · bare except: clauses · unpinned imports · unexplained magic numbers
Example — detecting issues manually:
import ast, pathlib
def find_hardcoded_paths(source: str) -> list[str]:
"""Return string literals that look like absolute paths."""
tree = ast.parse(source)
return [
node.s for node in ast.walk(tree)
if isinstance(node, ast.Constant)
and isinstance(node.s, str)
and node.s.startswith("/")
]
source = pathlib.Path("analysis.py").read_text()
print(find_hardcoded_paths(source))
Apply improvements in place. Always back up originals first.
# Before
def load_data(path):
import pandas as pd
return pd.read_csv(path)
# After
def load_data(path: str) -> "pd.DataFrame":
"""Load a CSV dataset from disk.
Parameters
----------
path : str
Path to the CSV file (relative to project root).
Returns
-------
pd.DataFrame
Raw dataset with original column names preserved.
"""
import pandas as pd
return pd.read_csv(path)
from pathlib import Path
import argparse
def parse_args():
parser = argparse.ArgumentParser()
parser.add_argument("--data", type=Path, default=Path("data/raw.csv"))
parser.add_argument("--output", type=Path, default=Path("results/"))
return parser.parse_args()
args = parse_args()
df = pd.read_csv(args.data)
args.output.mkdir(parents=True, exist_ok=True)
import random
import numpy as np
SEED = 42 # document this constant at module level
random.seed(SEED)
np.random.seed(SEED)
# scikit-learn
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier(random_state=SEED)
# PyTorch
import torch
torch.manual_seed(SEED)
torch.backends.cudnn.deterministic = True
import logging
from pathlib import Path
logging.basicConfig(level=logging.INFO, format="%(asctime)s [%(levelname)s] %(message)s")
logger = logging.getLogger(__name__)
def load_data(path: Path) -> "pd.DataFrame":
"""Load dataset with validation."""
import pandas as pd
if not path.exists():
raise FileNotFoundError(f"Data file not found: {path}")
logger.info("Loading data from %s", path)
df = pd.read_csv(path)
if df.empty:
raise ValueError(f"Loaded dataframe is empty: {path}")
logger.info("Loaded %d rows, %d columns", *df.shape)
return df
See references/environment-setup.md for full Dockerfile and Conda environment templates.
pip install pipreqs
pipreqs src/ --output requirements.txt --force
Verify resolution:
python -m venv .venv_test && source .venv_test/bin/activate
pip install -r requirements.txt
python -c "import pandas, numpy, sklearn"
deactivate && rm -rf .venv_test
name: my-research-env
channels:
- conda-forge
- defaults
dependencies:
- python=3.9
- numpy=1.24.3
- pandas=2.0.1
- scikit-learn=1.2.2
- matplotlib=3.7.1
- pip:
- some-pip-only-package==0.5.0
conda env create -f environment.yml
conda activate my-research-env
Generate a README.md containing at minimum:
## Requirements
<!-- List Python version and key packages with versions -->
## Installation
```text
conda env create -f environment.yml
conda activate my-research-env
python main.py --data data/raw.csv --output results/
config.py)
---
## Step 5: Validate Reproducibility
After all changes, verify that behaviour is unchanged:
```text
# 1. Run the full pipeline and capture output checksums
python main.py --data data/raw.csv --output results/
md5sum results/*.csv > checksums_refactored.md5
diff checksums_original.md5 checksums_refactored.md5
# 2. Run unit tests
pytest tests/ -v --tb=short
# 3. Confirm determinism across two clean runs
python main.py --output results_run1/
python main.py --output results_run2/
diff -r results_run1/ results_run2/
Reproducibility verification checklist:
requirements.txt / environment.yml installs cleanly in a fresh environment| Practice |
|---|
| Relative paths only |
| Pin dependency versions |
| Set random seeds |
| Docstrings on all public functions |
| Validate outputs against a baseline |
| Automate environment setup |
references/guide.md — Comprehensive user guidereferences/environment-setup.md — Dockerfile and full environment templatesreferences/examples/ — Working code examplesreferences/api-docs/ — Complete API documentationSkill ID: 455 | Version: 1.0 | License: MIT
Every final response should make these items explicit when they are relevant:
scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.This skill accepts requests that match the documented purpose of code-refactor-for-reproducibility and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
code-refactor-for-reproducibilityonly handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
Use the following fixed structure for non-trivial requests:
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.