Install
openclaw skills install @luigi08001/crm-data-cleanerDeduplicate, normalize, and enrich CRM contacts and companies. Use when a user needs to clean CRM data, find duplicate contacts, standardize phone numbers or emails, merge duplicate records, audit data quality, or enrich contacts with external sources like Clearbit or Apollo. Works with HubSpot, Salesforce, Pipedrive, or any CRM with CSV export. Instruction-only skill — no scripts or code execution. All operations are performed via CRM platform APIs or CSV export/import workflows.
openclaw skills install @luigi08001/crm-data-cleanerClean, accurate CRM data is the foundation of effective sales and marketing operations. Poor data quality costs businesses an average of $3.1 million annually through wasted time, missed opportunities, and ineffective campaigns. This skill provides comprehensive frameworks, tools, and automation strategies to maintain pristine contact and company data across all major CRM platforms.
This guide covers the three pillars of CRM data hygiene: Deduplication (removing duplicate records), Normalization (standardizing data formats), and Enrichment (filling missing information with reliable external sources).
Duplicate Records (30-40% of databases)
Inconsistent Formatting
Missing Information
Outdated Information
Sales Productivity Loss
Marketing Campaign Inefficiency
Customer Experience Issues
Completeness Score
Accuracy Score
Consistency Score
Uniqueness Score
Exact Duplicates
Near Duplicates
Company Duplicates
Household/Account Duplicates
Email-Based Matching (Most Reliable)
Match Criteria:
- Exact email match = 100% duplicate probability
- Domain + similar names = 85% probability
- Multiple emails for same person = merge candidates
Phone-Based Matching (Secondary)
Match Criteria:
- Exact phone match + similar name = 90% probability
- Same phone, different names = investigate
- Multiple formats of same number = normalize first
Name + Company Matching (Fuzzy)
Match Criteria:
- Exact name + exact company = 95% probability
- Similar name + exact company = 80% probability
- Exact name + similar company = 70% probability
Levenshtein Distance
Soundex Matching
Token Matching
High-Confidence Matches (90%+ probability)
Medium-Confidence Matches (60-89% probability)
Low-Confidence Matches (40-59% probability)
Review Queue Prioritization
Review Criteria Checklist
Pre-Merge Validation
Field Merge Rules
Post-Merge Cleanup
Native Duplicate Management
Custom Duplicate Rules
Email + Company Domain matching
Name similarity + Phone matching
LinkedIn URL exact matching
Custom property combinations
API-Based Deduplication
# Example HubSpot duplicate detection
import requests
def find_hubspot_duplicates(api_key, batch_size=100):
url = f"https://api.hubapi.com/contacts/v1/lists/all/contacts/all"
params = {
'hapikey': api_key,
'count': batch_size,
'property': ['email', 'firstname', 'lastname', 'company']
}
# Implementation details in scripts/
Duplicate Rules Setup
Third-Party Tools
Manual Duplicate Detection
North American Numbers
Input Variations:
- (555) 123-4567
- 555-123-4567
- 555.123.4567
- +1 555 123 4567
- 5551234567
Standardized Output:
- Display: +1 (555) 123-4567
- Storage: +15551234567
- Search: 15551234567
International Numbers
Input Variations:
- +44 20 7946 0958 (UK)
- 020 7946 0958 (UK local)
- +49 30 12345678 (Germany)
- 030-12345678 (Germany local)
Standardized Output:
- Display: +44 20 7946 0958
- Storage: +442079460958
Format Validation
Quality Indicators
Case Normalization
Input: John.Smith@COMPANY.COM
Output: john.smith@company.com
Domain Standardization
Common Variations:
- gmail.com vs googlemail.com → gmail.com
- hotmail.com vs live.com vs outlook.com → outlook.com
- yahoo.com vs ymail.com → yahoo.com
Plus Addressing Removal
Input: john.smith+newsletter@gmail.com
Output: john.smith@gmail.com
Dot Normalization (Gmail)
Input: j.o.h.n.s.m.i.t.h@gmail.com
Output: johnsmith@gmail.com
Syntax Validation (Level 1)
Domain Validation (Level 2)
Mailbox Validation (Level 3)
Name Case Normalization
Input Variations:
- JOHN SMITH
- john smith
- John SMITH
- jOHN sMITH
Standardized Output:
- John Smith
Name Component Parsing
Input: "Dr. John Michael Smith Jr."
Parsed Components:
- Title: Dr.
- First Name: John
- Middle Name: Michael
- Last Name: Smith
- Suffix: Jr.
Cultural Name Considerations
Legal Entity Normalization
Input Variations:
- Apple Inc.
- Apple Incorporated
- Apple, Inc
- Apple Computer Inc.
Standardized Output:
- Apple Inc.
Common Abbreviations
Standard Mappings:
- Corp → Corporation
- Co → Company
- Ltd → Limited
- LLC → Limited Liability Company
- LP → Limited Partnership
DBA (Doing Business As) Handling
Primary: Microsoft Corporation
DBA: Microsoft, MSFT
Subsidiaries: GitHub, LinkedIn
Street Address Formatting
Input Variations:
- 123 Main St.
- 123 Main Street
- 123 MAIN ST
- 123 main st
Standardized Output:
- 123 Main Street
State/Province Normalization
US States:
- California → CA
- New York → NY
- Texas → TX
Canadian Provinces:
- Ontario → ON
- British Columbia → BC
- Quebec → QC
Postal Code Formatting
US ZIP Codes:
- 12345 → 12345
- 12345-6789 → 12345-6789
- 123456789 → 12345-6789
Canadian Postal Codes:
- k1a0a6 → K1A 0A6
- K1A0A6 → K1A 0A6
United Kingdom Addresses
Standard Format:
[Building Number] [Street Name]
[District/Area]
[Town/City]
[County] [Postcode]
[Country]
European Address Formats
Seniority Level Mapping
C-Level Titles:
- CEO, Chief Executive Officer
- CTO, Chief Technology Officer
- CMO, Chief Marketing Officer
- CFO, Chief Financial Officer
VP Level Titles:
- VP, Vice President
- SVP, Senior Vice President
- EVP, Executive Vice President
Director Level Titles:
- Director, Dir
- Senior Director, Sr. Director
- Executive Director, Exec Director
Functional Area Mapping
Marketing Titles:
- Marketing Manager → Marketing
- Brand Manager → Marketing
- Content Manager → Marketing
- Digital Marketing Specialist → Marketing
Sales Titles:
- Sales Representative → Sales
- Account Manager → Sales
- Business Development → Sales
- Sales Engineer → Sales
Industry-Specific Normalization
Social Media Platforms
Public Databases
Web Scraping Sources
Comprehensive B2B Platforms
ZoomInfo (Premium)
Apollo (Mid-Range)
Clearbit (Developer-Focused)
Hunter (Email-Focused)
Technographic Data
Financial Data
Industry-Specific Data
Missing Data Analysis
-- Example missing data analysis
SELECT
COUNT(*) as total_contacts,
COUNT(phone) as has_phone,
COUNT(company) as has_company,
COUNT(job_title) as has_title,
(COUNT(*) - COUNT(phone)) as missing_phone,
(COUNT(*) - COUNT(company)) as missing_company
FROM contacts;
Enrichment Priority Matrix
Data Preparation
Enrichment Execution
Data Validation
Integration Back to CRM
// Example real-time enrichment on form submit
document.getElementById('leadForm').addEventListener('submit', async function(e) {
const email = document.getElementById('email').value;
const company = document.getElementById('company').value;
// Enrich contact data
const enrichedData = await enrichContact(email, company);
// Update hidden form fields
updateFormFields(enrichedData);
});
Verification Metrics
Data Decay Monitoring
Source Performance Comparison
Data Quality Command Center
Property Settings for Data Quality
Workflow Automation
Trigger: Contact is created or updated
Condition: Email domain contains common typos
Action: Flag for manual review + normalize email
Third-Party Apps
Custom Development
// HubSpot API example for bulk data cleaning
const hubspot = require('@hubspot/api-client');
async function cleanContactData(contacts) {
const hubspotClient = new hubspot.Client({ apiKey: API_KEY });
const cleanedContacts = contacts.map(contact => ({
id: contact.id,
properties: {
phone: normalizePhone(contact.properties.phone),
email: normalizeEmail(contact.properties.email),
company: normalizeCompanyName(contact.properties.company)
}
}));
return await hubspotClient.crm.contacts.batchApi.update({
inputs: cleanedContacts
});
}
Duplicate Management
Data Validation Rules
// Example validation rule for phone format
REGEX(Phone, "^\\+?1?[2-9]\\d{2}[2-9]\\d{2}\\d{4}$")
Flow-Based Automation
Paid Solutions
Custom Apex Solutions
// Custom Apex for email normalization
public class EmailNormalizer {
public static String normalizeEmail(String email) {
if (String.isBlank(email)) return email;
return email.toLowerCase().trim();
}
}
Smart Contact Data
Custom Fields and Validation
Automation Features
Real-Time Validation
Scheduled Batch Processing
Event-Triggered Cleaning
Contact Quality Scoring
def calculate_contact_quality_score(contact):
score = 0
# Completeness (40 points)
if contact.email: score += 15
if contact.phone: score += 10
if contact.company: score += 10
if contact.job_title: score += 5
# Accuracy (40 points)
if is_valid_email(contact.email): score += 20
if is_valid_phone(contact.phone): score += 20
# Freshness (20 points)
days_since_update = (datetime.now() - contact.last_modified).days
if days_since_update < 30: score += 20
elif days_since_update < 90: score += 10
return min(score, 100)
Company Quality Scoring
Data Quality Metrics
Trend Analysis
Alert Thresholds
Executive Summary Dashboard
Operational Dashboard
Detailed Analysis Reports
Inbound Data Processing
External Source → Validation → Normalization → Deduplication → Enrichment → CRM
Outbound Data Synchronization
CRM → Clean Data → External Systems (Email, Analytics, etc.)
Real-Time vs Batch Processing
Data Owner (Executive Level)
Data Steward (Operational Level)
Data Users (Sales/Marketing Teams)
Data Entry Standards
Contact Creation Requirements:
- Email address (validated)
- Company name (standardized)
- Job title (normalized)
- Phone number (formatted)
- Source attribution
Update Procedures
Retention and Archival
Basic Data Hygiene Training
Advanced Training Topics
Ongoing Education
Standard Operating Procedures
Troubleshooting Guides
GDPR Considerations
CCPA Requirements
Access Controls
Data Protection
Quality Score Prediction
Duplicate Detection ML
Company Name Matching
Job Title Standardization
Smart Assignment Rules
def assign_data_cleaning_task(record, quality_issues):
if record.value_tier == 'enterprise':
return 'manual_review_queue'
elif len(quality_issues) > 3:
return 'bulk_processing_queue'
elif 'duplicate' in quality_issues:
return 'dedup_automation_queue'
else:
return 'standard_cleaning_queue'
Priority-Based Processing
Real-Time Processing
Batch Processing
Bidirectional Sync
Conflict Resolution
RESTful API Design
# Example API endpoint for data cleaning
@app.route('/api/v1/contacts/clean', methods=['POST'])
def clean_contact_data():
data = request.get_json()
# Validate input
if not validate_input(data):
return {'error': 'Invalid input'}, 400
# Process cleaning
cleaned_data = {
'email': normalize_email(data.get('email')),
'phone': normalize_phone(data.get('phone')),
'company': normalize_company(data.get('company'))
}
return {'cleaned_data': cleaned_data}, 200
This comprehensive CRM data cleaning skill provides the foundation for maintaining high-quality customer and prospect data across all major platforms. Implementation of these strategies will dramatically improve sales productivity, marketing effectiveness, and overall customer experience while reducing operational overhead and compliance risk.