Senior Data Engineer
Security checks across malware telemetry and agentic risk
Overview
This appears to be a coherent data-engineering skill, with the main cautions being user-run scripts/examples that can change data systems and profiling reports that may contain sensitive data.
This skill looks safe to install for data-engineering assistance, but run its scripts and copied examples only in approved project environments. Check write targets, schedules, and overwrite/merge modes before using them on production data, pin optional dependencies, and protect any generated profiling or quality reports as potentially sensitive.
VirusTotal
66/66 vendors flagged this skill as clean.
Risk analysis
Artifact-based informational review of SKILL.md, metadata, install specs, static scan signals, and capability signals. ClawScan does not execute the skill or run runtime probes.
Running similar examples against live systems could overwrite or materially change datasets.
The reference material includes examples that overwrite data lake outputs. This is normal for ETL documentation, but it can replace data if copied against production paths.
aggregated.write \
.mode("overwrite") \
.partitionBy("event_type") \
.parquet(f"s3://data-lake/batch-views/daily_agg/date={date}")Use development environments first, verify destination paths and write modes, and require review/backup before running overwrite or merge operations in production.
If untrusted table or column names are passed into similar code, it could query unintended data or create SQL-injection risk.
A reference data-quality example builds SQL using interpolated table and column names. It is purpose-aligned, but adapted code should validate identifiers to avoid unsafe queries.
query = f"""
SELECT
COUNT(*) as total,
SUM(CASE WHEN {column} IS NULL THEN 1 ELSE 0 END) as nulls
FROM {table}
"""
result = self.conn.execute(query).fetchone()Validate table and column identifiers, use allowlists where possible, and parameterize values rather than interpolating untrusted input.
Installing the latest package version can pull changing dependencies or behave differently over time.
The workflow includes an optional unpinned package install. This is expected setup guidance for a data quality workflow, but dependency version and provenance are left to the user.
# Install and initialize pip install great_expectations great_expectations init
Install optional tools in a virtual environment, pin versions, and use trusted package indexes or your organization’s approved dependency process.
Profile reports could expose representative values or patterns from private datasets if shared broadly.
The profiling tool can write output reports, detects sensitive-looking patterns such as credit cards, and stores top values in column profiles. That is useful for data quality, but reports may contain sensitive dataset-derived values.
python data_quality_validator.py profile data.csv --output profile.json ... 'credit_card': r'^\\d{4}[\\s\\-]?\\d{4}[\\s\\-]?\\d{4}[\\s\\-]?\\d{4}$' ... top_values: List[Tuple[str, int]] = field(default_factory=list)Treat generated profiles and quality reports as sensitive, redact or suppress value samples for PII, and store outputs in approved locations.
