# URL Threat Scanning

Scan URLs for phishing, malware, and domain reputation threats using multiple intelligence sources.

## Usage

```bash
# Basic scan (local heuristics always run)
python3 scripts/check_malware.py "https://suspicious-site.com"

# Verbose output with full signal details
python3 scripts/check_malware.py "https://example.com/login" --verbose
```

## Response

```json
{
  "result": "suspicious",
  "confidence": 0.55,
  "url": "https://suspicious-site.com",
  "threats": ["VirusTotal: 3/87 engines flagged"],
  "heuristic_risk_score": 0.45,
  "apis_used": ["local_heuristics", "virustotal"]
}
```

## Threat Intelligence Sources

| Source | API Key | Free Tier | What It Checks |
|--------|---------|-----------|---------------|
| **Local Heuristics** | None | Unlimited | Suspicious TLDs, typosquatting, IP-as-host, phishing patterns |
| **VirusTotal** | `VIRUSTOTAL_API_KEY` | 4 req/min | 70+ antivirus engines, domain reputation |
| **URLScan.io** | `URLSCAN_API_KEY` | 1000/day | Browser-based render, phishing for 1500+ brands |
| **Google Safe Browsing** | `GOOGLE_SAFE_BROWSING_KEY` | 10k req/day | Malware, social engineering, unwanted software |

## Local Heuristic Signals

The scanner runs these checks without any API key:

| Signal | Severity | Example |
|--------|----------|---------|
| Suspicious TLD | Medium | `.tk`, `.xyz`, `.click`, `.buzz` |
| No HTTPS | Medium | `http://bank-login.com` |
| IP as hostname | High | `http://192.168.1.1/login` |
| Typosquatting | High | `paypa1.com` resembling `paypal.com` |
| Excessive subdomains | Medium | `secure.login.account.verify.evil.com` |
| Punycode / IDN | High | `xn--80ak6aa92e.com` (homograph attack) |
| URL-encoded domain | High | `%70aypal.com` |
| Phishing keywords | Medium | `login`, `verify`, `secure` in non-standard domains |
| URL shortener | Medium | `bit.ly`, `t.co` (hides destination) |
| Very long URL | Low | 200+ character URLs used for obfuscation |

## Verdicts

| Result | Meaning |
|--------|---------|
| `malicious` | Multiple engines confirm threat |
| `suspicious` | Some signals flagged — investigate |
| `caution` | Minor heuristic signals — likely safe but unusual |
| `clean` | No threats detected |

## API Key Setup

```bash
export VIRUSTOTAL_API_KEY=your_key     # https://virustotal.com
export URLSCAN_API_KEY=your_key        # https://urlscan.io
export GOOGLE_SAFE_BROWSING_KEY=your_key  # https://console.cloud.google.com
```

### Getting API Keys

**VirusTotal** (recommended — most comprehensive):
1. Create a free account at [virustotal.com](https://virustotal.com)
2. Go to your profile → API key
3. Free tier: 4 requests/minute, 500 requests/day

**URLScan.io** (optional — visual analysis):
1. Create a free account at [urlscan.io](https://urlscan.io)
2. Navigate to Settings → API
3. Free tier: 1000 scans/day

**Google Safe Browsing** (optional — Google's threat lists):
1. Go to [Google Cloud Console](https://console.cloud.google.com)
2. Enable the Safe Browsing API
3. Create credentials → API key
4. Free tier: 10,000 lookups/day