Senior Data Engineer

Security checks across malware telemetry and agentic risk

Overview

This appears to be a coherent data-engineering skill, with the main cautions being user-run scripts/examples that can change data systems and profiling reports that may contain sensitive data.

This skill looks safe to install for data-engineering assistance, but run its scripts and copied examples only in approved project environments. Check write targets, schedules, and overwrite/merge modes before using them on production data, pin optional dependencies, and protect any generated profiling or quality reports as potentially sensitive.

VirusTotal

66/66 vendors flagged this skill as clean.

View on VirusTotal

Risk analysis

Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.

#
ASI02: Tool Misuse and Exploitation
Low
What this means

Running similar examples against live systems could overwrite or materially change datasets.

Why it was flagged

The reference material includes examples that overwrite data lake outputs. This is normal for ETL documentation, but it can replace data if copied against production paths.

Skill content
aggregated.write \
        .mode("overwrite") \
        .partitionBy("event_type") \
        .parquet(f"s3://data-lake/batch-views/daily_agg/date={date}")
Recommendation

Use development environments first, verify destination paths and write modes, and require review/backup before running overwrite or merge operations in production.

#
ASI02: Tool Misuse and Exploitation
Low
What this means

If untrusted table or column names are passed into similar code, it could query unintended data or create SQL-injection risk.

Why it was flagged

A reference data-quality example builds SQL using interpolated table and column names. It is purpose-aligned, but adapted code should validate identifiers to avoid unsafe queries.

Skill content
query = f"""
                SELECT
                    COUNT(*) as total,
                    SUM(CASE WHEN {column} IS NULL THEN 1 ELSE 0 END) as nulls
                FROM {table}
            """
            result = self.conn.execute(query).fetchone()
Recommendation

Validate table and column identifiers, use allowlists where possible, and parameterize values rather than interpolating untrusted input.

#
ASI04: Agentic Supply Chain Vulnerabilities
Info
What this means

Installing the latest package version can pull changing dependencies or behave differently over time.

Why it was flagged

The workflow includes an optional unpinned package install. This is expected setup guidance for a data quality workflow, but dependency version and provenance are left to the user.

Skill content
# Install and initialize
pip install great_expectations

great_expectations init
Recommendation

Install optional tools in a virtual environment, pin versions, and use trusted package indexes or your organization’s approved dependency process.

#
ASI06: Memory and Context Poisoning
Low
What this means

Profile reports could expose representative values or patterns from private datasets if shared broadly.

Why it was flagged

The profiling tool can write output reports, detects sensitive-looking patterns such as credit cards, and stores top values in column profiles. That is useful for data quality, but reports may contain sensitive dataset-derived values.

Skill content
python data_quality_validator.py profile data.csv --output profile.json ... 'credit_card': r'^\\d{4}[\\s\\-]?\\d{4}[\\s\\-]?\\d{4}[\\s\\-]?\\d{4}$' ... top_values: List[Tuple[str, int]] = field(default_factory=list)
Recommendation

Treat generated profiles and quality reports as sensitive, redact or suppress value samples for PII, and store outputs in approved locations.