QA Test Data Generator

v1.0.0

Generate realistic test data for software testing. Create structured data sets with Chinese locale support (names, ID cards, phone numbers, addresses, bank c...

0· 53·0 current·0 all-time
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description match the provided SKILL.md and scripts/gen_data.py. The script implements user/order/product generators, output formats (JSON/CSV/SQL), and uses Faker('zh_CN') as advertised. No extraneous credentials, binaries, or unrelated capabilities are requested.
Instruction Scope
Runtime instructions are limited to installing Faker (pip) and running the included Python script with arguments. The SKILL.md and script do not instruct reading arbitrary system files, exporting environment variables, contacting external endpoints, or exfiltrating data. Edge-case payloads (XSS, SQLi strings) are present but appropriate for testing purposes.
Install Mechanism
No install specification is embedded in the skill bundle; the README suggests installing the widely used 'faker' package via pip. This is a standard, low-risk instruction-only install path (no downloads from unknown hosts or archive extraction).
Credentials
The skill declares no required environment variables, credentials, or config paths and the code does not read environment secrets. Requested access is proportionate to the stated test-data generation purpose.
Persistence & Privilege
The skill is not marked always:true and does not request persistent system-wide changes or modify other skills. Autonomous invocation is allowed by default (platform normal), but there is no elevated privilege or persistence requested by the skill itself.
Assessment
This package appears to be a straightforward local test-data generator that uses the Python Faker library and the included script. Before installing: (1) confirm you trust the faker package from PyPI (pip install faker) and install in a virtualenv to avoid system-wide side effects; (2) do not run generated SQL or test accounts against production databases or systems—the examples intentionally include XSS/SQLi edge cases for testing; (3) be cautious with realistic-looking personal identifiers (ID cards, phone numbers, credit card-like numbers) to avoid creating data that could be mistaken for or correlated with real people or violate local rules; (4) if you need ID/credit-card formats that are strictly non-sensitive, consider using clearly non-valid placeholder formats or additional masking. If you want deeper assurance, request a full content review of the truncated ID-generation code referenced in SKILL.md (the snippet was truncated) to confirm no hidden network calls or telemetry are present.

Like a lobster shell, security has layers — review code before you run it.

latestvk9721m4k0d1qzxj424bm7f7qtd84xyd6
53downloads
0stars
1versions
Updated 6d ago
v1.0.0
MIT-0

Test Data Generator

Generate realistic, structured test data for testing.

Quick Generation with Python Faker

Reference: https://faker.readthedocs.io/en/master/

Install

pip install faker

Chinese Locale Data

from faker import Faker

fake = Faker('zh_CN')

# Basic personal info
print(fake.name())           # 张三
print(fake.phone_number())   # 13800138000
print(fake.address())        # 北京市朝阳区...
print(fake.company())        # XX科技有限公司
print(fake.email())          # zhangsan@example.com
print(fake.ssn())            # 身份证号 (18位)
print(fake.credit_card_number())  # 银行卡号

# Date/time
print(fake.date_of_birth(minimum_age=18, maximum_age=65))
print(fake.date_between(start_date='-1y', end_date='today'))

Batch Generation Script

Use scripts/gen_data.py for batch data generation:

# Generate 100 users as JSON
python3 scripts/gen_data.py --type users --count 100 --format json --output users.json

# Generate 500 orders as CSV
python3 scripts/gen_data.py --type orders --count 500 --format csv --output orders.csv

# Generate SQL INSERT statements
python3 scripts/gen_data.py --type users --count 50 --format sql --table users --output seed.sql

Data Generation Patterns

User Data

from faker import Faker
import json, random

fake = Faker('zh_CN')

def gen_user(uid):
    return {
        "id": uid,
        "name": fake.name(),
        "email": fake.ascii_free_email(),
        "phone": fake.phone_number(),
        "id_card": fake.ssn(),
        "gender": random.choice(["male", "female"]),
        "birth_date": str(fake.date_of_birth(minimum_age=18, maximum_age=60)),
        "address": fake.address(),
        "created_at": str(fake.date_time_between(start_date='-2y')),
        "status": random.choices(["active", "inactive", "banned"],
                                weights=[85, 10, 5])[0],
    }

users = [gen_user(i) for i in range(1, 101)]
print(json.dumps(users, ensure_ascii=False, indent=2))

Order Data (with foreign keys)

def gen_order(oid, user_ids, product_ids):
    qty = random.randint(1, 10)
    price = round(random.uniform(9.9, 999.9), 2)
    return {
        "id": oid,
        "user_id": random.choice(user_ids),
        "product_id": random.choice(product_ids),
        "quantity": qty,
        "unit_price": price,
        "total": round(qty * price, 2),
        "status": random.choices(
            ["pending", "paid", "shipped", "delivered", "cancelled"],
            weights=[10, 20, 25, 40, 5])[0],
        "created_at": str(fake.date_time_between(start_date='-6m')),
        "address": fake.address(),
    }

Boundary & Edge Case Data

Always include these special values in test datasets:

EDGE_CASES = {
    "string": [
        "",                          # empty
        " ",                         # whitespace only
        "a" * 256,                   # max length
        "a" * 257,                   # over max length
        "<script>alert(1)</script>", # XSS
        "' OR 1=1 --",              # SQLi
        "张三\x00李四",              # null byte
        "🔍 emoji test 🎉",         # emoji
        "Ñoño café résumé",         # unicode accents
    ],
    "number": [0, -1, 1, 2147483647, -2147483648, 0.001, 99999999.99],
    "date": ["1970-01-01", "2099-12-31", "2000-02-29", "2001-02-29"],  # leap year
    "phone": ["13800138000", "19900001111", "12345678901", "1380013800"],  # boundary
}

Output Formats

SQL INSERT

def to_sql(users, table="users"):
    cols = users[0].keys()
    lines = [f"INSERT INTO {table} ({','.join(cols)}) VALUES"]
    for i, u in enumerate(users):
        vals = []
        for v in u.values():
            if v is None:
                vals.append("NULL")
            elif isinstance(v, (int, float)):
                vals.append(str(v))
            else:
                vals.append(f"'{str(v).replace(chr(39), chr(39)*2)}'")
        sep = "," if i < len(users) - 1 else ";"
        lines.append(f"  ({','.join(vals)}){sep}")
    return "\n".join(lines)

CSV

import csv

def to_csv(data, filepath):
    with open(filepath, 'w', newline='', encoding='utf-8-sig') as f:
        writer = csv.DictWriter(f, fieldnames=data[0].keys())
        writer.writeheader()
        writer.writerows(data)

JSON Fixture

def to_json(data, filepath):
    with open(filepath, 'w', encoding='utf-8') as f:
        json.dump(data, f, ensure_ascii=False, indent=2, default=str)

Chinese ID Card Validation

Reference: GB 11643-1999 (公民身份号码标准)

def validate_id_card(id_number: str) -> bool:
    """Validate Chinese 18-digit ID card number per GB 11643-1999."""
    if len(id_number) != 18:
        return False
    weights = [7, 9, 10, 5, 8, 4, 2, 1, 6, 3, 7, 9, 10, 5, 8, 4, 2]
    check_codes = '10X98765432'
    try:
        total = sum(int(id_number[i]) * weights[i] for i in range(17))
        return check_codes[total % 11] == id_number[17].upper()
    except (ValueError, IndexError):
        return False

def gen_valid_id_card():
    """Generate a valid Chinese ID card number with correct checksum."""
    import random
    # Area codes (sample: Beijing, Shanghai, Guangdong)
    areas = ['110101', '310101', '440106', '330102', '510104']
    area = random.choice(areas)
    birth = fake.date_of_birth(minimum_age=18, maximum_age=60).strftime('%Y%m%d')
    seq = f"{random.randint(0, 999):03d}"
    base = area + birth + seq
    weights = [7, 9, 10, 5, 8, 4, 2, 1, 6, 3, 7, 9, 10, 5, 8, 4, 2]
    check_codes = '10X98765432'
    total = sum(int(base[i]) * weights[i] for i in range(17))
    return base + check_codes[total % 11]

Data Masking / Desensitization

For copying production data to test environments:

def mask_phone(phone: str) -> str:
    """138****8000"""
    if len(phone) >= 11:
        return phone[:3] + "****" + phone[7:]
    return "***"

def mask_id_card(id_card: str) -> str:
    """110101****0001"""
    if len(id_card) >= 18:
        return id_card[:6] + "********" + id_card[14:]
    return "***"

def mask_name(name: str) -> str:
    """张*"""
    if len(name) >= 2:
        return name[0] + "*" * (len(name) - 1)
    return "*"

def mask_email(email: str) -> str:
    """z***@example.com"""
    local, domain = email.split("@")
    return local[0] + "***@" + domain

def mask_bank_card(card: str) -> str:
    """6222 **** **** 1234"""
    if len(card) >= 16:
        return card[:4] + " **** **** " + card[-4:]
    return "****"

Tips

  • Always use ensure_ascii=False for Chinese JSON output
  • Use utf-8-sig encoding for CSV files opened in Excel
  • Set Faker.seed(42) for reproducible test data
  • Mix normal data (90%) with edge cases (10%) for realistic coverage
  • Include data relationships (FK references) when generating related tables
  • For large datasets (>10k rows), use batch inserts and transactions

Comments

Loading comments...