Contact Enrichment
Extract contact information from a website using a 3-tier approach. Direct HTML scraping, WHOIS lookup, then Hunter.io API domain search for verified busines...
MIT-0 · Free to use, modify, and redistribute. No attribution required.
⭐ 0 · 187 · 0 current installs · 0 all-time installs
MIT-0
Security Scan
OpenClaw
Benign
high confidencePurpose & Capability
The name/description (contact enrichment via scraping, WHOIS, Hunter.io) matches the instructions and the declared Python packages (requests, beautifulsoup4, lxml, python-whois). No unrelated environment variables, binaries, or config paths are requested.
Instruction Scope
The SKILL.md stays within the stated purpose: it scrapes common contact pages, runs WHOIS lookups, and optionally calls Hunter.io. It does not instruct accessing unrelated system files or secrets. Note: it does not mention obeying robots.txt, rate-limiting, or domain allowlists, and it will fetch arbitrary URLs supplied to it — which raises operational/privacy considerations (see guidance).
Install Mechanism
Instruction-only skill with no install spec and no archive downloads. Declared Python packages are reasonable for the described tasks. There is no high-risk download or executable install step.
Credentials
No required environment variables or secrets are requested. Hunter.io API key is listed as optional (appropriate and proportional). The declared packages and optional API key are consistent with functionality.
Persistence & Privilege
always is false and the skill does not request persistent or elevated platform privileges. It doesn't modify other skills or system-wide settings.
Assessment
This skill appears coherent with its description, but consider the following before installing or running it:
- Privacy & legality: The skill collects email addresses and phone numbers (personal data). Make sure you have the right to harvest or store this data and that your use complies with laws and the target site's terms of service.
- Network scope & SSRF risk: The skill will fetch arbitrary URLs. Avoid feeding it attacker-controlled or internal-only URLs (localhost, 169.254.x.x, private IP ranges) to prevent server-side request forgery or leaking internal resources.
- Robots, throttling, and politeness: The instructions do not mention obeying robots.txt or rate limits. If you will use this at scale, add rate-limiting, retries/backoff, and respect site crawl policies.
- Dependencies: The skill lists Python packages (requests, beautifulsoup4, lxml, python-whois). Ensure these are installed from trusted sources (PyPI) in an isolated environment to reduce supply-chain risk.
- Hunter.io key: If you provide HUNTER_API_KEY, treat it as a secret and restrict its scope. The key is optional; the skill works with free tiers or without it if WHOIS/scraping succeed.
If you need stricter controls, require an allowlist of domains, add internal-address blocking, and add explicit instructions to respect robots.txt and rate limits before running this skill.Like a lobster shell, security has layers — review code before you run it.
Current versionv1.0.0
Download ziplatest
License
MIT-0
Free to use, modify, and redistribute. No attribution required.
SKILL.md
Contact Enrichment Skill
3-tier contact extraction. Runs cheapest method first, escalates only when needed.
Extraction Hierarchy
Tier 1: Direct HTML Scrape (free, always try first)
→ Check: homepage, /contact, /about, /about-us, /contact-us
→ Extract: all emails, all phone numbers, business name
↓ (if no email found)
Tier 2: WHOIS Lookup (free)
→ Extract: registrant org, registrant email
↓ (if still no email)
Tier 3: Hunter.io API (paid, 25 free/month)
→ Domain search → verified emails + first/last name + job title
Output Format
{
"business_name": "Green Valley Landscaping",
"emails": ["info@greenvalley.com", "john@greenvalley.com"],
"primary_email": "info@greenvalley.com",
"phones": ["(503) 555-0123"],
"primary_phone": "(503) 555-0123",
"owner_name": "John Smith", # from Hunter.io or About page
"whois_org": "Green Valley LLC",
"whois_email": "registrant@email.com",
"source": "scrape", # "scrape" | "whois" | "hunter"
"confidence": "high" # "high" | "medium" | "low"
}
Tier 1: Direct HTML Scrape
import re, requests
from bs4 import BeautifulSoup
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
}
EMAIL_REGEX = re.compile(
r'\b[a-zA-Z0-9._%+\-]+@[a-zA-Z0-9.\-]+\.[a-zA-Z]{2,}\b'
)
PHONE_REGEX = re.compile(
r'(\+?1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
)
# Emails that are never real contacts
EMAIL_BLACKLIST = {
"example.com", "sentry.io", "schema.org", "w3.org",
"wix.com", "wordpress.com", "squarespace.com", "shopify.com",
"google.com", "jquery.com", "bootstrap.com", "cloudflare.com"
}
def is_valid_email(email: str) -> bool:
domain = email.split("@")[-1].lower()
return domain not in EMAIL_BLACKLIST and len(email) < 100
def scrape_page(url: str) -> tuple[str, str]:
"""Fetch a page. Returns (html, final_url)."""
try:
r = requests.get(url, headers=HEADERS, timeout=8, allow_redirects=True)
if r.status_code < 400:
return r.text, r.url
except Exception:
pass
return "", url
def extract_from_html(html: str) -> dict:
soup = BeautifulSoup(html, "lxml")
text = soup.get_text(separator=" ")
# Extract emails
emails = list({e for e in EMAIL_REGEX.findall(text) if is_valid_email(e)})
# Extract phones
phones_raw = PHONE_REGEX.findall(text)
phones = list({p[0] + p[1] if isinstance(p, tuple) else p for p in phones_raw})
# Clean up: remove very short matches
phones = [p.strip() for p in phones if len(re.sub(r'\D', '', p)) >= 10]
# Extract business name from title
title = soup.find("title")
business_name = ""
if title:
t = title.text.strip()
# Take the first meaningful segment
for sep in ["|", "–", "-", "·", "—", ":"]:
if sep in t:
t = t.split(sep)[0].strip()
break
business_name = t
return {"emails": emails, "phones": phones, "business_name": business_name}
def scrape_contact_pages(base_url: str, existing_html: str = "") -> dict:
"""
Scrape multiple pages for contact info.
existing_html: pass HTML already fetched by website-auditor to avoid re-fetching.
"""
from urllib.parse import urljoin
all_emails = set()
all_phones = set()
business_name = ""
# Try homepage first (using already-fetched HTML if available)
if existing_html:
data = extract_from_html(existing_html)
all_emails.update(data["emails"])
all_phones.update(data["phones"])
business_name = data["business_name"]
# Try contact/about pages
contact_paths = ["/contact", "/about", "/contact-us", "/about-us",
"/get-in-touch", "/reach-us", "/find-us"]
for path in contact_paths:
if all_emails:
break # Found emails — no need to keep crawling
url = urljoin(base_url, path)
html, _ = scrape_page(url)
if html:
data = extract_from_html(html)
all_emails.update(data["emails"])
all_phones.update(data["phones"])
if not business_name:
business_name = data["business_name"]
emails = list(all_emails)
phones = list(all_phones)
return {
"emails": emails,
"primary_email": emails[0] if emails else None,
"phones": phones,
"primary_phone": phones[0] if phones else None,
"business_name": business_name,
"source": "scrape",
"confidence": "high" if emails else "low"
}
Tier 2: WHOIS Lookup
import whois # pip install python-whois
def whois_lookup(domain: str) -> dict:
"""
Look up WHOIS registration data.
Returns registrant org + email if publicly available.
"""
try:
w = whois.whois(domain)
# Extract emails (some registrars redact these)
emails = w.emails if isinstance(w.emails, list) else ([w.emails] if w.emails else [])
emails = [e for e in emails if e and is_valid_email(e) and "whoisprotect" not in e.lower()
and "privacy" not in e.lower() and "proxy" not in e.lower()]
org = w.get("org") or w.get("registrant_organization") or ""
creation = w.get("creation_date")
if isinstance(creation, list):
creation = creation[0]
return {
"whois_org": org,
"whois_email": emails[0] if emails else None,
"whois_emails": emails,
"domain_created": str(creation)[:10] if creation else None,
"registrar": w.get("registrar")
}
except Exception as e:
return {"whois_org": None, "whois_email": None, "whois_error": str(e)}
Tier 3: Hunter.io API
import requests, os
def hunter_domain_search(domain: str, limit: int = 5) -> dict:
"""
Search Hunter.io for emails associated with a domain.
Returns verified business emails + contact names.
Requires HUNTER_API_KEY env var.
Free: 25 searches/month | Starter: $34/mo for 500
"""
api_key = os.environ.get("HUNTER_API_KEY")
if not api_key:
return {"hunter_emails": [], "hunter_source": "no_key"}
params = {
"domain": domain,
"api_key": api_key,
"limit": limit,
"type": "personal" # personal | generic
}
try:
r = requests.get("https://api.hunter.io/v2/domain-search", params=params, timeout=10)
data = r.json().get("data", {})
contacts = []
for email_obj in data.get("emails", []):
contacts.append({
"email": email_obj.get("value"),
"first_name": email_obj.get("first_name"),
"last_name": email_obj.get("last_name"),
"position": email_obj.get("position"),
"confidence": email_obj.get("confidence"),
"linkedin": email_obj.get("linkedin")
})
primary = contacts[0] if contacts else {}
return {
"hunter_emails": [c["email"] for c in contacts if c["email"]],
"primary_email": primary.get("email"),
"owner_name": f"{primary.get('first_name', '')} {primary.get('last_name', '')}".strip(),
"owner_title": primary.get("position"),
"owner_linkedin": primary.get("linkedin"),
"hunter_contacts": contacts,
"organization": data.get("organization"),
"source": "hunter",
"confidence": "high" if contacts else "low"
}
except Exception as e:
return {"hunter_emails": [], "hunter_error": str(e)}
def hunter_email_verify(email: str) -> dict:
"""Verify if an email is deliverable."""
api_key = os.environ.get("HUNTER_API_KEY")
if not api_key:
return {"valid": None}
params = {"email": email, "api_key": api_key}
r = requests.get("https://api.hunter.io/v2/email-verifier", params=params)
data = r.json().get("data", {})
return {"valid": data.get("status") == "valid", "score": data.get("score")}
Full Enrichment Runner
import tldextract
def enrich_contact(url: str, existing_html: str = "") -> dict:
"""
Run all 3 tiers. Returns the best contact data found.
Pass existing_html from website-auditor to avoid re-fetching.
"""
ext = tldextract.extract(url)
domain = f"{ext.domain}.{ext.suffix}"
result = {"url": url, "domain": domain}
# Tier 1: Scrape
scrape_data = scrape_contact_pages(url, existing_html)
result.update(scrape_data)
if not result.get("primary_email"):
# Tier 2: WHOIS
whois_data = whois_lookup(domain)
result.update(whois_data)
if whois_data.get("whois_email"):
result["primary_email"] = whois_data["whois_email"]
result["emails"] = [whois_data["whois_email"]] + result.get("emails", [])
result["source"] = "whois"
result["confidence"] = "medium"
if not result.get("primary_email"):
# Tier 3: Hunter.io
hunter_data = hunter_domain_search(domain)
result.update(hunter_data)
if hunter_data.get("primary_email"):
result["emails"] = hunter_data["hunter_emails"]
result["source"] = "hunter"
result["confidence"] = "high"
return result
Privacy & Ethics Notes
- Only scrape publicly visible contact info
- Respect
robots.txtfor large-scale operations - Do not scrape pages behind login walls
- WHOIS data is public by design — legal to use
- Hunter.io only indexes publicly available emails
- Always include opt-out mechanism in outreach
- Comply with CAN-SPAM and GDPR when emailing
Files
1 totalSelect a file
Select a file to preview.
Comments
Loading comments…
