mrmrmr
LLM-powered automated Mendelian Randomization for causal discovery in biomedical research
Like a lobster shell, security has layers — review code before you run it.
License
Runtime requirements
SKILL.md
MRAgent - Automated Mendelian Randomization Analysis Agent
Description
MRAgent is an intelligent agent that automates the entire process of Mendelian randomization analysis for causal discovery in biomedical research. It:
-
Knowledge Discovery Mode (KD): Given a disease (outcome), automatically scans PubMed literature to discover potential modifiable exposure (risk factor) that have correlation but no established causal inference, then performs comprehensive Mendelian randomization analysis using OpenGWAS data to identify novel causal relationships.
-
Causal Validation Mode (CV): Directly test whether a user-specified exposure (e.g., "body mass index") has a causal effect on a specific outcome (e.g., "type 2 diabetes") using two-sample Mendelian randomization.
MRAgent handles all steps automatically:
- PubMed literature crawling
- LLM-based extraction of candidate exposure-outcome pairs
- Check for existing MR studies via PubMed search
- Optional STROBE-MR quality assessment of existing studies
- UMLS medical synonym expansion
- OpenGWAS database query for GWAS summary statistics
- LLM selection of most appropriate GWAS datasets
- Multiple MR methods: Inverse variance weighted, MR-Egger, weighted median, etc.
- Heterogeneity analysis and pleiotropy sensitivity testing
- MRlap correction for sample overlap
- Automatic generation of publication-ready PDF reports with LLM-written interpretation
Requirements
- Python packages (install via
pip install -r {baseDir}/requirements.txt) - R language (>= 4.3.4) with the following packages installed:
TwoSampleMR- core Mendelian randomizationieugwasr- OpenGWAS interfacevcfR- required for MRlap (optional)MRlap- sample overlap correction (optional)jsonlite- JSON processing (required for MRlap)
OPENAI_API_KEYenvironment variable must be set (OpenAI API key)OPENGWAS_JWTenvironment variable (optional, for OpenGWAS access token)
Usage
Knowledge Discovery Mode (recommended for novel discoveries)
Discover novel causal relationships starting from a disease:
python {baseDir}/run_mragent.py --mode KD --outcome "back pain" --num-pubmed 100 --bidirectional
Discover novel causal relationships starting from an exposure:
(Currently the CLI implements KD starting from outcome; for exposure-based discovery use the Python API directly)
Causal Validation Mode
Validate a specific hypothesis: does exposure causally affect outcome?
python {baseDir}/run_mragent.py --mode CV --exposure "body mass index" --outcome "type 2 diabetes"
Common Options
| Option | Description | Example |
|---|---|---|
--mode KD|CV | Required. KD=knowledge discovery, CV=causal validation | --mode CV |
--outcome NAME | Required. Outcome (disease) to study | --outcome "back pain" |
--exposure NAME | Required in CV mode. Exposure factor to test | --exposure "osteoporosis" |
--num-pubmed N | Number of papers to fetch from PubMed (default: 100) | --num-pubmed 50 |
--model MR|MR_MOE | MR model type. MR=standard, MR_MOE=mixture of experts | --model MR_MOE |
--bidirectional | Perform bidirectional analysis (also test outcome → exposure) | --bidirectional |
--no-synonyms | Disable synonym expansion (faster) | --no-synonyms |
--strobe-mr | Enable STROBE-MR quality assessment of existing studies | --strobe-mr |
--mrlap | Enable MRlap sample overlap correction | --mrlap |
--output-dir DIR | Output directory (default: ./output) | --output-dir /tmp/mragent-out |
--steps 1,2,3 | Only run specific steps (for debugging/intervention) | --steps 1,2 |
Environment Variables
| Variable | Required | Description |
|---|---|---|
OPENAI_API_KEY | Yes | OpenAI API key |
OPENGWAS_JWT | Optional | OpenGWAS JWT access token |
LLM_MODEL | No | LLM model name (default: gpt-4o) |
LLM_PROVIDER | No | openai or ollama (default: openai) |
OPENAI_BASE_URL | No | Custom base URL for OpenAI-compatible API |
MRAGENT_SOURCE_PATH | No | Path to original MRAgent source if not installed globally |
Output
MRAgent outputs a JSON summary to stdout with:
success: boolean indicating successoutput_directory: directory containing all resultsdiscovered_pairs: number of candidate exposure-outcome pairs foundselected_for_mr: number of pairs selected for MR analysisreports: list of paths to generated PDF files (Report.pdf, Introduction.pdf, Conclusion.pdf, etc.)
Example output:
{
"success": true,
"mode": "CV",
"outcome": "back pain",
"exposure": "osteoarthritis",
"output_directory": "./output/osteoarthritis_back_pain_gpt-4o",
"discovered_pairs": 1,
"selected_for_mr": 1,
"reports": [
"./output/osteoarthritis_back_pain_gpt-4o/osteoarthritis_back_pain/Introduction.pdf",
"./output/osteoarthritis_back_pain_gpt-4o/osteoarthritis_back_pain/osteoarthritis_back_pain/MR_ieu-a-2_ieu-a-1008/Report.pdf",
"./output/osteoarthritis_back_pain_gpt-4o/osteoarthritis_back_pain/Conclusion.pdf"
]
}
Workflow Steps
When running all steps (default):
| Step | Description |
|---|---|
| 1 | Crawl PubMed, extract candidate exposure-outcome pairs |
| 2 | Check if each pair already has MR studies in literature |
| 3 | Extract unique terms, expand with medical synonyms |
| 4 | Check which terms have available GWAS data in OpenGWAS |
| 5 | LLM selects most appropriate GWAS IDs for each term |
| 6 | Generate all combinations of exposure-outcome pairs |
| 7 | Check new combinations for existing MR studies |
| 8 | Select final set of novel pairs for analysis |
| 9 | Run MR analysis, generate plots and LLM-interpreted PDF reports |
Notes
- This is a computationally intensive process. The full analysis can take tens of minutes to hours depending on the number of pairs.
- All intermediate results are saved as CSV files in the output directory, allowing manual editing and intervention between steps.
- MRAgent requires R to be installed with the necessary packages, because the actual MR analysis is performed by R's TwoSampleMR package.
Files
32 totalComments
Loading comments…
