# LigandMPNN Ligand-Aware Sequence Design

## Overview

LigandMPNN extends ProteinMPNN to enable ligand-aware protein sequence design. It designs protein sequences optimized for interacting with ligands, small molecules, nucleotides, metals, and other non-protein components.

### When to Use LigandMPNN
- Protein with bound ligand/small molecule
- Enzyme design with substrate/cofactor
- Protein-DNA/RNA binding design
- Metalloprotein design
- Fixed binding site residues needed

### When NOT to Use
- No ligand present → Use ProteinMPNN (faster)
- Need de novo backbone → Use BoltzGen or Pinal
- Only need stability → Use ThermoMPNN

## Decision Tree

```
What do you need to design?
│
├─ Protein binding small molecule?
│   └─ submit_ligandmpnn_prediction
│       → ligand_mpnn_use_atom_context: true
│
├─ Enzyme with cofactor?
│   └─ submit_ligandmpnn_prediction
│       → fixed_residues: "A23 A45" (catalytic)
│       → ligand_mpnn_use_atom_context: true
│
├─ Need side chain packing?
│   └─ submit_ligandmpnn_prediction
│       → pack_side_chains: true
│       → pack_with_ligand_context: true
│
├─ Score existing sequence?
│   └─ submit_ligandmpnn_scoring
│
└─ Design specific residues only?
    └─ submit_ligandmpnn_prediction
        → redesigned_residues: "A10 A11 A12"
```

## Parameters

### Essential Parameters

| Parameter | Type | Range | Default | Description |
|-----------|------|-------|---------|-------------|
| `temperature` | float | 0.01-1.0 | 0.1 | Sampling temperature |
| `batch_size` | int | 1-5 | 1 | Sequences per batch |
| `number_of_batches` | int | 1-10 | 1 | Number of batches |
| `seed` | int | 1-999999 | 111 | Random seed |
| `ligand_mpnn_use_atom_context` | bool | - | true | Use ligand atoms |

### Temperature Guide

| Temperature | Effect | Use Case |
|-------------|--------|----------|
| 0.01-0.05 | Very conservative | Critical binding sites |
| 0.1 | Conservative | Default, balanced |
| 0.2-0.3 | More diverse | Exploration |
| 0.5-1.0 | Very diverse | Maximum variation |

### Design Control Parameters

| Parameter | Format | Description |
|-----------|--------|-------------|
| `ligand_mpnn_cutoff_for_score` | 4.0-12.0 Å | Distance cutoff (default 8.0) |
| `fixed_residues` | "A10 A11 B5" | Keep these unchanged |
| `redesigned_residues` | "A10 A11 B5" | Only redesign these |
| `bias_AA` | "W:3.0,C:-5.0" | Amino acid biases |
| `omit_AA` | "CP" | Exclude these AAs |
| `chains_to_design` | "A,B" | Design only these chains |
| `parse_these_chains_only` | "A,B,C" | Parse only these chains |

### Side Chain Packing

| Parameter | Default | Description |
|-----------|---------|-------------|
| `pack_side_chains` | false | Enable full-atom packing |
| `number_of_packs_per_design` | 4 | Packing samples |
| `ligand_mpnn_use_side_chain_context` | false | Use fixed side chains |
| `repack_everything` | false | Repack all residues |
| `pack_with_ligand_context` | true | Consider ligands in packing |

## Quality Metrics

### Output Metrics

| Metric | Description | Good Value |
|--------|-------------|------------|
| `overall_confidence` | Model confidence | 0.7-1.0 |
| `ligand_confidence` | Ligand region confidence | 0.7-1.0 |
| `seq_rec` | Sequence recovery | Context-dependent |
| `num_ligand_res` | Residues near ligand | Varies |

## Common Mistakes

### Wrong: Not using atom context
```
❌ ligand_mpnn_use_atom_context: false
   → Ignores ligand, acts like ProteinMPNN
```
```
✅ ligand_mpnn_use_atom_context: true
   → Considers ligand atoms during design
```

### Wrong: Cutoff too small
```
❌ ligand_mpnn_cutoff_for_score: 4.0
   → Misses important interactions
```
```
✅ ligand_mpnn_cutoff_for_score: 8.0
   → Default captures most binding site residues
```

### Wrong: Not fixing catalytic residues
```
❌ Redesigning entire enzyme including active site
```
```
✅ fixed_residues: "A23 A45 A67"
   → Preserve critical catalytic residues
```

### Wrong: Using both fixed and redesigned
```
❌ fixed_residues: "A10" AND redesigned_residues: "A20"
   → Mutually exclusive, behavior undefined
```
```
✅ Use ONE of:
   - fixed_residues (fix these, design rest)
   - redesigned_residues (only design these)
```

## API Usage

### Basic Ligand-Aware Design
```bash
curl -X POST "https://api.openbio.tech/api/v1/tools" \
  -H "X-API-Key: $OPENBIO_API_KEY" \
  -F "tool_name=submit_ligandmpnn_prediction" \
  -F 'params={
    "input_file_path": "protein_with_ligand.pdb",
    "temperature": 0.1,
    "batch_size": 1,
    "number_of_batches": 2,
    "ligand_mpnn_use_atom_context": true
  }'
```

### Design with Fixed Binding Site
```bash
curl -X POST "https://api.openbio.tech/api/v1/tools" \
  -H "X-API-Key: $OPENBIO_API_KEY" \
  -F "tool_name=submit_ligandmpnn_prediction" \
  -F 'params={
    "input_file_path": "enzyme_substrate.pdb",
    "temperature": 0.1,
    "batch_size": 1,
    "number_of_batches": 2,
    "fixed_residues": "A23 A45 A67 A89",
    "ligand_mpnn_use_atom_context": true
  }'
```

### Design with Side Chain Packing
```bash
curl -X POST "https://api.openbio.tech/api/v1/tools" \
  -H "X-API-Key: $OPENBIO_API_KEY" \
  -F "tool_name=submit_ligandmpnn_prediction" \
  -F 'params={
    "input_file_path": "protein_ligand.pdb",
    "temperature": 0.1,
    "pack_side_chains": true,
    "number_of_packs_per_design": 4,
    "pack_with_ligand_context": true,
    "ligand_mpnn_use_atom_context": true
  }'
```

### Design with AA Restrictions
```bash
curl -X POST "https://api.openbio.tech/api/v1/tools" \
  -H "X-API-Key: $OPENBIO_API_KEY" \
  -F "tool_name=submit_ligandmpnn_prediction" \
  -F 'params={
    "input_file_path": "protein.pdb",
    "temperature": 0.1,
    "omit_AA": "CP",
    "bias_AA": "S:2.0,T:2.0",
    "ligand_mpnn_use_atom_context": true
  }'
```

### Sequence Scoring
```bash
curl -X POST "https://api.openbio.tech/api/v1/tools" \
  -H "X-API-Key: $OPENBIO_API_KEY" \
  -F "tool_name=submit_ligandmpnn_scoring" \
  -F 'params={
    "input_file_path": "designed_protein.pdb",
    "ligand_mpnn_use_atom_context": true,
    "ligand_mpnn_cutoff_for_score": 8.0
  }'
```

## Expected Runtime

| Protein Size | With Packing | Without Packing |
|--------------|--------------|-----------------|
| Small (<100 aa) | 3-5 min | 2-3 min |
| Medium (100-300 aa) | 5-10 min | 3-5 min |
| Large (>300 aa) | 10-20 min | 5-10 min |

## Output Files

### Generated Files
- **FASTA files** (`seqs/*.fa`): Designed sequences with metrics
- **PDB files** (`backbones/*.pdb`): Backbone structures with new sequences
- **Summary JSON**: Processing statistics

## Troubleshooting

| Issue | Cause | Fix |
|-------|-------|-----|
| No ligand residues detected | Missing HETATM | Check PDB has HETATM records |
| Low ligand_confidence | Bad ligand coordinates | Verify ligand has valid coords |
| Processing timeout | Large protein | Reduce batches, disable packing |
| Empty output | Bad PDB format | Check backbone completeness |

## Sample Output

### Successful Job Response
```json
{
  "success": true,
  "job_id": "ligandmpnn_abc123",
  "message": "Job submitted successfully",
  "estimated_runtime": "3-5 minutes"
}
```

### Output FASTA Header
```
>enzyme_substrate_0001, score=1.45, global_score=1.38
MKTAYIAKQRQISFVKSHFSRQLE...
>enzyme_substrate_0002, score=1.52, global_score=1.41
MKTAYIAKQRQISFVKSQFSRQLD...
```

### What Good Output Looks Like
- **Score**: 1.0-2.0 (lower = more confident)
- **Ligand detected**: "Found ligand: LIG (12 atoms)"
- **Active site residues**: Preserved or optimized

## Typical Performance

| Campaign Size | Time | Notes |
|---------------|------|-------|
| 10 backbones × 8 seq | 10-15 min | Quick test |
| 100 backbones × 8 seq | 45-90 min | Standard |
| 500 backbones × 16 seq | 3-6 hours | Large campaign |

**Throughput**: ~40-80 sequences/minute for typical proteins.

## Verify Success

```bash
# Check job status
curl -s "https://api.openbio.tech/api/v1/jobs/{job_id}/status" \
  -H "X-API-Key: $OPENBIO_API_KEY" | jq '.status'

# Verify ligand was detected in logs
# Download results and check sequence count
grep -c "^>" output.fa
```

## Best Practices

1. **Enable atom context**: Always use `ligand_mpnn_use_atom_context: true`
2. **Fix critical residues**: Preserve catalytic sites with `fixed_residues`
3. **Use appropriate cutoff**: Default 8.0 Å works for most binding sites
4. **Start conservative**: Use temperature 0.1 initially
5. **Validate with structure prediction**: Run Boltz on designed sequences
6. **Increase batches for diversity**: Better than increasing batch_size

### Failure Recovery

```
Ligand not recognized?
├── Check HETATM records in PDB
│   └── grep "^HETATM" protein.pdb | head
├── Verify ligand has proper residue name
│   └── Standard 3-letter code (e.g., ATP, NAD, LIG)
└── Ensure ligand has coordinates
    └── Check for 0,0,0 placeholder coordinates

Low ligand_confidence?
├── Verify ligand coordinates are correct
│   └── Visual inspection in PyMOL/ChimeraX
├── Increase cutoff_for_score
│   └── ligand_mpnn_cutoff_for_score: 10.0-12.0
└── Try with side chain context
    └── ligand_mpnn_use_side_chain_context: true
```

## LigandMPNN vs ProteinMPNN

| Feature | LigandMPNN | ProteinMPNN |
|---------|------------|-------------|
| Ligand awareness | Yes | No |
| Fixed residues | Yes | Limited |
| Side chain packing | Yes | No |
| Sequence scoring | Yes | No |
| Speed | Slightly slower | Faster |

---

**Next**: Validate designed sequences with `Boltz` → Check stability with `ThermoMPNN`.