PRISM-GEN-DEMO
English: Retrieve, filter, sort, merge, and visualize multiple CSV result files from PRISM-Gen molecular generation/screening. Provides portable query-based...
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
PRISM-Gen Demo
Description
PRISM-Gen Demo is a read-only data analysis skill for exploring pre-calculated molecular screening results from the PRISM-Gen broad-spectrum coronavirus Mpro inhibitor discovery pipeline.
All analysis is performed locally on CSV files bundled with the skill. No network access, no external API calls, no credential requirements.
What This Skill Does
- Retrieve data from any of 10 pipeline stage CSV files (step3a through step5b)
- Filter molecules by property thresholds (pIC50, QED, MW, LogP, gap_ev, hERG_Prob, etc.)
- Sort and rank molecules by any numeric column
- Merge data across pipeline stages using SMILES as join keys
- Score molecules with composite scoring, worst-case multi-target analysis, summary statistics, and Pareto front identification
- Plot property distributions, scatter plots, docking score heatmaps, pipeline attrition funnels, and Pareto fronts
- Summarize the full pipeline with attrition statistics and key findings
What This Skill Does NOT Do
- Does NOT run any computational chemistry calculations (no DFT, no docking, no MD)
- Does NOT access any network or external services
- Does NOT require or use any credentials, API keys, or tokens
- Does NOT modify any files outside its own output directory
- Does NOT contain any shell scripts
- Does NOT use hardcoded absolute paths
Commands
retrieve
Retrieve and display data from any pipeline stage.
python3 scripts/retrieve.py --stage step5b --columns name,pIC50,Broad_Spectrum_Score --max_rows 10
python3 scripts/retrieve.py --list_stages
python3 scripts/retrieve.py --stage step4a --list_columns
python3 scripts/retrieve.py --stage step5b --name mol_16
filter
Filter molecules by property thresholds. Multiple conditions are combined with AND.
python3 scripts/filter.py --stage step4a --where "pIC50>7.5" "QED>0.7" "hERG_Prob<0.5"
python3 scripts/filter.py --stage step5b --where "Broad_Spectrum_Score<-7.0" "Is_Final_Top==True"
sort
Sort and rank molecules by any column.
python3 scripts/sort.py --stage step5b --by Broad_Spectrum_Score --ascending --top 10
python3 scripts/sort.py --stage step4c --by R_global --top 20
merge
Merge data across pipeline stages on SMILES keys.
python3 scripts/merge.py --stages step3c,step4a --on smiles --columns pIC50,gap_ev,Lipinski_Pass,hERG_Prob
score
Compute composite scores, worst-case analysis, statistics, and Pareto fronts.
python3 scripts/score.py --mode worst_case --stage step5a --top 10
python3 scripts/score.py --mode composite --stage step4c --weights "pIC50:1.0,QED:0.5,R_ADMET:2.0"
python3 scripts/score.py --mode stats --stage step5b --columns pIC50,QED,Broad_Spectrum_Score
python3 scripts/score.py --mode pareto --stage step5b --obj1 pIC50 --obj2 QED
plot
Generate visualizations (requires matplotlib).
python3 scripts/plot.py --mode histogram --stage step4a --column pIC50 --output hist.png
python3 scripts/plot.py --mode scatter --stage step5b --x pIC50 --y Broad_Spectrum_Score --output scatter.png
python3 scripts/plot.py --mode heatmap --stage step5a --output heatmap.png
python3 scripts/plot.py --mode funnel --output funnel.png
python3 scripts/plot.py --mode pareto --stage step5b --x pIC50 --y QED --output pareto.png
summary
Generate a full pipeline summary report.
python3 scripts/summary.py
python3 scripts/summary.py --detailed
Available Pipeline Stages
| Stage Key | Description | Rows | Columns |
|---|---|---|---|
| step3a | RL-optimized molecules (generation + surrogate scoring) | 200 | 10 |
| step3a_top | Top 200 molecules by Reward | 200 | 10 |
| step3b | GFN2-xTB electronic structure results | 200 | 6 |
| step3c | xTB-refined ranking with GEM scoring | 200 | 24 |
| step4a | ADMET filtering (Lipinski, hERG, QED) | 200 | 38 |
| step4b | B3LYP/6-31G* DFT validation (PySCF) | 46 | 34 |
| step4c | Master summary (all stages merged) | 200 | 65 |
| step5a | Broad-spectrum docking (3 targets) | 36 | 13 |
| step5b | Final candidates with full annotations | 36 | 75 |
| step5b_master | Master summary with docking | 200 | 74 |
Dependencies
- Required: Python 3.7+ (core functionality uses only Python standard library: csv, json, argparse, math, os, sys)
- Optional: matplotlib (for visualization commands only; all non-plot commands work without it)
Security
- Pure Python implementation — no shell scripts, no subprocess calls, no os.system calls
- No hardcoded paths — all file paths are resolved relative to the skill root directory
- No network access — all data is bundled locally in the data/ directory
- No credentials — no environment variables, API keys, or tokens required
- Read-only on input data — CSV files in data/ are never modified
- Output isolation — generated plots are written only to user-specified paths
License
MIT-0 — Free to use, modify, and redistribute. No attribution required.
Author
@SenaZeng
Related
Files
21 totalComments
Loading comments…
