fuzzy-match

v0.1.0

A toolkit for fuzzy string matching and data reconciliation. Useful for matching entity names (companies, people) across different datasets where spelling va...

0· 83·0 current·0 all-time

Install

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for wu-uk/invoice-fraud-detection-fuzzy-match.

Previewing Install & Setup.
Prompt PreviewInstall & Setup
Install the skill "fuzzy-match" (wu-uk/invoice-fraud-detection-fuzzy-match) from ClawHub.
Skill page: https://clawhub.ai/wu-uk/invoice-fraud-detection-fuzzy-match
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install invoice-fraud-detection-fuzzy-match

ClawHub CLI

Package manager switcher

npx clawhub@latest install invoice-fraud-detection-fuzzy-match
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
The name and description (fuzzy string matching / data reconciliation) match the SKILL.md examples (difflib, normalization, rapidfuzz). There are no unrelated requirements (no cloud credentials, no unrelated binaries).
Instruction Scope
The instructions are limited to string-similarity algorithms, normalization, and example code. They do not direct the agent to read arbitrary files, environment variables, system paths, or send data to external endpoints.
Install Mechanism
There is no install spec (instruction-only). The doc mentions optionally installing rapidfuzz via pip, which is reasonable for a performance library and does not by itself create risk in the skill bundle.
Credentials
No environment variables, credentials, or config paths are requested. The examples run in-process and only require standard Python libraries or an optional third-party package (rapidfuzz).
Persistence & Privilege
always is false and the skill is user-invocable. The skill does not request persistent presence or modify other skills or system-wide configs.
Assessment
This guide is internally consistent and low-risk: it only describes local string-matching techniques. Before using, confirm where the example code will run (your local machine or a hosted agent) and avoid feeding sensitive data into untrusted environments. If you want rapidfuzz performance, install it from the official PyPI package (pip install rapidfuzz) in a controlled environment.

Like a lobster shell, security has layers — review code before you run it.

latestvk97c7q4w8e90q9t1fntwz72zh584xkhb
83downloads
0stars
1versions
Updated 1w ago
v0.1.0
MIT-0

Fuzzy Matching Guide

Overview

This skill provides methods to compare strings and find the best matches using Levenshtein distance and other similarity metrics. It is essential when joining datasets on string keys that are not identical.

Quick Start

from difflib import SequenceMatcher

def similarity(a, b):
    return SequenceMatcher(None, a, b).ratio()

print(similarity("Apple Inc.", "Apple Incorporated"))
# Output: 0.7...

Python Libraries

difflib (Standard Library)

The difflib module provides classes and functions for comparing sequences.

Basic Similarity

from difflib import SequenceMatcher

def get_similarity(str1, str2):
    """Returns a ratio between 0 and 1."""
    return SequenceMatcher(None, str1, str2).ratio()

# Example
s1 = "Acme Corp"
s2 = "Acme Corporation"
print(f"Similarity: {get_similarity(s1, s2)}")

Finding Best Match in a List

from difflib import get_close_matches

word = "appel"
possibilities = ["ape", "apple", "peach", "puppy"]
matches = get_close_matches(word, possibilities, n=1, cutoff=0.6)
print(matches)
# Output: ['apple']

rapidfuzz (Recommended for Performance)

If rapidfuzz is available (pip install rapidfuzz), it is much faster and offers more metrics.

from rapidfuzz import fuzz, process

# Simple Ratio
score = fuzz.ratio("this is a test", "this is a test!")
print(score)

# Partial Ratio (good for substrings)
score = fuzz.partial_ratio("this is a test", "this is a test!")
print(score)

# Extraction
choices = ["Atlanta Falcons", "New York Jets", "New York Giants", "Dallas Cowboys"]
best_match = process.extractOne("new york jets", choices)
print(best_match)
# Output: ('New York Jets', 100.0, 1)

Common Patterns

Normalization before Matching

Always normalize strings before comparing to improve accuracy.

import re

def normalize(text):
    # Convert to lowercase
    text = text.lower()
    # Remove special characters
    text = re.sub(r'[^\w\s]', '', text)
    # Normalize whitespace
    text = " ".join(text.split())
    # Common abbreviations
    text = text.replace("limited", "ltd").replace("corporation", "corp")
    return text

s1 = "Acme  Corporation, Inc."
s2 = "acme corp inc"
print(normalize(s1) == normalize(s2))

Entity Resolution

When matching a list of dirty names to a clean database:

clean_names = ["Google LLC", "Microsoft Corp", "Apple Inc"]
dirty_names = ["google", "Microsft", "Apple"]

results = {}
for dirty in dirty_names:
    # simple containment check first
    match = None
    for clean in clean_names:
        if dirty.lower() in clean.lower():
            match = clean
            break

    # fallback to fuzzy
    if not match:
        matches = get_close_matches(dirty, clean_names, n=1, cutoff=0.6)
        if matches:
            match = matches[0]

    results[dirty] = match

Comments

Loading comments...