Install
openclaw skills install protein-phylogenyComprehensive protein family phylogenetic analysis workflow with quality control, conservation analysis, coevolution network analysis, and publication-ready visualization. Use when: (1) analyzing protein family evolution, (2) building phylogenetic trees from sequences, (3) identifying conserved/coevolved residues, (4) generating publication-quality figures and reports, (5) quality-controlling sequence datasets, or (6) performing systematic evolutionary analysis of enzyme families, protein superfamilies, or any homologous protein groups.
openclaw skills install protein-phylogenyComplete workflow for protein family evolutionary analysis: quality control → conservation → coevolution → phylogeny → publication report.
Input: FASTA file with protein sequences (any family, any size)
Output: Publication-ready report with phylogenetic tree, conservation analysis, coevolution networks, and high-quality figures
Typical workflow:
# 1. Quality control (removes low-quality sequences)
bash scripts/01_quality_control.sh input.fasta output_dir/
# 2. Conservation analysis
bash scripts/02_conservation.sh output_dir/qc/final.fasta output_dir/
# 3. Coevolution analysis
bash scripts/03_coevolution.sh output_dir/qc/final.fasta output_dir/
# 4. Phylogenetic tree
bash scripts/04_phylogeny.sh output_dir/qc/final.fasta output_dir/
# 5. Generate figures
bash scripts/05_visualize.sh output_dir/
# 6. Create report
bash scripts/06_report.sh output_dir/ "Family Name"
Purpose: Filter raw sequences to high-quality, non-redundant dataset
Steps:
Key parameters:
Output: qc/final.fasta (high-quality aligned sequences)
Purpose: Identify functionally important conserved residues
Method: Shannon entropy
Output:
Purpose: Identify residue pairs that evolve together
Method: Normalized Mutual Information (NMI)
Output:
Purpose: Reconstruct evolutionary relationships
Method: IQ-TREE maximum likelihood
Output:
Purpose: Generate publication-quality figures (300 DPI)
Figures:
Style: Clean, colorblind-friendly, Nature/Science standards
Purpose: Create comprehensive analysis report
Sections:
Format: Markdown → Feishu/Word/PDF
Required tools:
Installation:
bash scripts/install_dependencies.sh
Problem: Alignment unreliable, phylogeny uncertain
Solution:
Problem: Many unreliable positions
Solution:
-gt 0.8)Problem: Tree topology unstable
Solution:
Problem: Family definition unclear
Solution:
Edit scripts/01_quality_control.sh parameters:
CDHIT_THRESHOLD=0.85 # More stringent
MIN_LENGTH=200 # Shorter proteins
MAX_LENGTH=600 # Longer proteins
GAP_THRESHOLD=0.25 # Stricter gap cutoff
See references/04-phylogeny.md for:
Edit scripts/05_visualize.sh for:
Issue: CD-HIT crashes with large datasets
Fix: Split input, process in batches, merge results
Issue: IQ-TREE runs forever
Fix: Use -fast mode or reduce bootstrap replicates
Issue: Figures look pixelated
Fix: Increase DPI in scripts/05_visualize.sh (default 300)
Issue: Report generation fails
Fix: Check all intermediate files exist, rerun failed stages
For detailed methodology, see:
If you use this workflow, please cite:
# Download your sequences
# (from UniProt, NCBI, or your own database)
# Run full workflow
bash scripts/run_full_workflow.sh sequences.fasta analysis_output/ "Your Family Name"
# Results in analysis_output/:
# - qc/final.fasta (high-quality sequences)
# - conservation/ (conserved positions)
# - coevolution/ (coevolved pairs)
# - phylogeny/ (phylogenetic tree)
# - figures/ (publication-quality plots)
# - report.md (complete analysis)