data-skill

v1.0.0

专门处理日常办公场景下的高频、复杂数据分析与处理的助手。使用本地代码执行模式（SQL 或 Python + SQLite）来处理数据导入、清洗、查询、提取、合并拆分及报告生成，支持大数据量且保障数据隐私安全。当用户需要处理 Excel/CSV 文件、跨表查询、生成图表或输出数据分析报告时使用此 Skill。

⭐ 0· 132·0 current·0 all-time

by@lgwanai

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for lgwanai/data-skill.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "data-skill" (lgwanai/data-skill) from ClawHub.
Skill page: https://clawhub.ai/lgwanai/data-skill
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install data-skill

ClawHub CLI

Package manager switcher

npx clawhub@latest install data-skill

Security Scan

VirusTotal

Benign

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

The name/description align with included scripts (data_importer.py, data_cleaner.py, chart_generator.py, exporter) and many ECharts templates/assets; those files are consistent with a local data-analysis/charting assistant. However, the registry metadata does not declare required runtimes/binaries even though SKILL.md and the scripts expect Python and sqlite3 (and the requirements.txt lists pandas/thefuzz/etc.). That mismatch (no declared required binaries or install steps) is an incoherence to be aware of.

Instruction Scope

SKILL.md explicitly instructs the agent to generate and run local Python/SQLite commands (python scripts/... and sqlite3 commands) and to start a local HTTP server to serve charts. Those instructions are within the described purpose but the file contains examples with a hard-coded absolute path (e.g., /Users/wuliang/...) which is inappropriate and brittle. The SKILL.md also instructs to keep data local and only surface aggregates — good practice — but the presence of a pre-scan 'base64-block' prompt-injection signal in SKILL.md is concerning (could hide encoded instructions). You should review SKILL.md and the scripts for any hidden/encoded content and confirm the server binding behavior (does it bind localhost only or 0.0.0.0?).

Install Mechanism

The skill has no install spec (instruction-only), yet it contains many executable scripts and a requirements.txt. Because nothing is declared to be installed automatically, an agent or user would need to run pip/other commands manually; the absence of an install mechanism but presence of code and dependency list is inconsistent. There are no external download URLs in the provided metadata (which is lower risk), but lack of guidance increases the chance of accidental insecure setup (e.g., running scripts without vetting).

ℹ

Credentials

The skill requests no environment variables or credentials, which is appropriate for a purely local data tool. Still, SKILL.md's chart generator auto-starts a local HTTP server and returns an access URL; you should confirm that server.py does not expose the service to external networks or attempt to transmit data outward. Also check that the scripts do not reference unexpected environment variables or config files at runtime.

✓

Persistence & Privilege

The skill is not marked always:true and requests no special platform privileges. It does not claim to alter other skills or system-wide agent settings. Running the included local HTTP server and writing outputs to an outputs/ directory is normal for this type of tool, but you should verify server binding and file paths before use.

Scan Findings in Context

[base64-block] unexpected: A base64-block pattern was detected in SKILL.md. Base64 blocks can conceal instructions or encoded payloads; this is not expected for a transparent local data-analysis skill. Inspect SKILL.md and all scripts for encoded or obfuscated content before running.

What to consider before installing

This skill appears to implement a local data-import/clean/visualization workflow and includes many helper scripts and chart templates, but there are several red flags you should check before installing or running it with real or sensitive data: 1) Verify runtimes and dependencies: The skill's SKILL.md assumes Python and sqlite3 and there is a requirements.txt (pandas, thefuzz, openpyxl, etc.), but the registry metadata does not declare required binaries or an install step. Ensure you install dependencies in an isolated virtualenv and do not run anything until you review the code. 2) Audit the code (especially server.py and data_exporter/data_importer): Look for any network code or server binding (check whether any HTTP server binds to 0.0.0.0), hard-coded external endpoints, calls that might send data off-host, or attempts to read unexpected system paths. If the server binds to a network interface, restrict it to localhost or run behind a firewall. 3) Search for encoded/obfuscated content: The scanner found a base64-block in SKILL.md. Search SKILL.md and scripts for base64 strings or other obfuscated payloads and decode them to verify intent before execution. 4) Correct hard-coded paths: Examples in SKILL.md include absolute paths (e.g., /Users/wuliang/...). Update to relative workspace paths or confirm they won’t overwrite user files. 5) Test with non-sensitive data in an isolated environment: Run the skill on synthetic data in a sandbox/container to confirm behavior, verify the undo/non-destructive mechanisms, and observe whether the local HTTP server only serves local files. 6) If you lack capacity to audit the code, treat it as untrusted: do not run on confidential data. Consider asking the author for a minimal install/run guide that declares required binaries and explains how the server binds and what URLs it returns. If you confirm the above (no hidden network exfiltration, server bound to localhost, no obfuscated payloads), the skill's functionality is coherent with its stated purpose. Until then, proceed cautiously.

✗

assets/echarts/echarts.min.js:45

Dynamic code execution detected.

Patterns worth reviewing

These patterns may indicate risky behavior. Check the VirusTotal and OpenClaw results above for context-aware analysis before installing.

Like a lobster shell, security has layers — review code before you run it.

latestvk9719ss6904as11btmn2rkahcn83ct4p

132downloads

0stars

1versions

Updated 1mo ago

v1.0.0

MIT-0

Data Analysis Assistant Workflow

This skill transforms the agent into a powerful local data analysis assistant, strictly adhering to a Local Code Execution paradigm.

Core Architecture & Principles

Local Execution First: NEVER read large datasets directly into the context window. Always generate Python scripts or SQL commands and execute them locally using RunCommand.
SQLite as the Engine: All CSV/Excel files should be imported into a local SQLite database (default: workspace.db). Rely on SQL for robust data manipulation (filtering, joining, grouping).
Non-Destructive Operations (Undo Mechanism): Do not overwrite original tables. When modifying data, create a new table (e.g., CREATE TABLE table_v2 AS SELECT ...) or a View. This guarantees the user can always say "undo the last step".
Data Privacy: Keep data local. Only send aggregated statistics or schema info into the context window.

Scenarios & Procedures

Scenario 1: Data Import & Auto-Cleaning

Trigger: User uploads or specifies a CSV/Excel/WPS(.et)/Numbers file. Action:

Run the built-in importer script (supports .csv, .xlsx, .xls, .et, .numbers):
```
python scripts/data_importer.py "path/to/file.xlsx" --db workspace.db
```
Note: This script calculates the MD5 hash of the file. If an identical file was already imported, it skips the import and returns the existing table name. It also automatically handles merged cells, detects the real header row, chunks large CSVs, and sanitizes column names for SQLite.

Once imported, run a quick check to understand the schema and data:

sqlite3 workspace.db "PRAGMA table_info(table_name);"
sqlite3 workspace.db "SELECT * FROM table_name LIMIT 3;" -header -column

Ask the user if they want to perform standard cleaning (e.g., handling missing values, deduplication). Execute these via SQL.

Scenario 2: Continuous Queries & Manipulation

Trigger: User asks to filter, sort, aggregate, or add columns. Action:

Formulate the SQL query.
Execute it via RunCommand: sqlite3 workspace.db "SELECT ..."
For structural changes, remember the Undo principle: CREATE TABLE table_name_step2 AS SELECT ...

Scenario 3: Semantic Extraction & Fuzzy Join

Trigger: User wants to split addresses, do sentiment analysis, or join tables with mismatched keys (e.g., "Beijing Branch" vs "Beijing Office"). Action:

Generate a Python script using pandas and sqlite3.
For Fuzzy Joins, use libraries like thefuzz or difflib in the Python script to match keys, then write the mapping back to SQLite.
For Semantic extraction, use regex or heuristic rules in Python. If LLM analysis is strictly required, write a script that processes the column locally or prompts the user for permission to send a sample.

Scenario 4: Chart Generation

Trigger: User requests a visualization (bar, pie, line, scatter, map, funnel, 3D charts, etc.). Action:

Do NOT write custom Python scripts from scratch.
We have a powerful template-based rendering engine. Use the built-in scripts/chart_generator.py script.
First, identify the required chart type. Look into references/prompts/ directory to find the corresponding Prompt skeleton for the exact chart type (e.g., references/prompts/line/stacked_area.md). Read the prompt to understand the data structure requirements.
Formulate the SQL query that aggregates the data correctly according to the prompt's requirements.
Generate the custom_js and echarts_option based on the prompt template.

Construct a JSON configuration file (save it in outputs/configs/) matching this structure:

{
    "db_path": "workspace.db",
    "query": "SELECT category, SUM(value) as val FROM table GROUP BY category",
    "title": "Chart Title",
    "output_path": "/Users/wuliang/workspace/data-skill/outputs/html/output_chart.html",
    "echarts_option": { ... }, // Generated option from prompt
    "custom_js": "..." // Optional JS logic for complex data binding
}

Note: For map charts requiring coordinates, use the built-in Geocoding capabilities or ECharts native geo coordinate systems. Output files MUST be stored in the isolated outputs/html/ directory.

Execute the command:

python scripts/chart_generator.py --config outputs/configs/your_config.json

The script will automatically start a local HTTP server and return an access URL. Provide this URL to the user to view the interactive chart.

Scenario 5: File Merging & Splitting

Trigger: User needs to combine multiple identical reports or split a master sheet by department. Action:

Merge: Iterate over the files and run data_importer.py pointing to the same table name (the script appends automatically if the table exists, or write a custom Python script).
Split: Generate a Python script that reads the master table from SQLite and exports it into multiple Excel files using pandas.DataFrame.to_excel() inside a loop.

Scenario 6: Export & Reporting

Trigger: User wants to download the final result or generate a summary report. Action:

Export CSV/Excel: Use the built-in exporter script to dump a table or query result to .csv or .xlsx:

# Export an entire table
python scripts/data_exporter.py "outputs/final_result.csv" --table "final_table"

# Export a specific query
python scripts/data_exporter.py "outputs/final_result.xlsx" --query "SELECT category, SUM(value) FROM sales GROUP BY category"

Report Generation: Write a Markdown file summarizing the analysis steps, key metrics (retrieved via SQL), and referencing any generated charts. Provide the user with the path to the report.

Scenario 7: Data Cleanup

Trigger: Routine maintenance or user request to clean up old data. Action:

Run the cleaner script to remove tables and metadata not accessed in the last 30 days:
```
python scripts/data_cleaner.py --db workspace.db --days 30
```

Scenario 8: Metrics Management

Trigger: User describes or defines a specific metric calculation logic or business definition (口径). Action:

When the user provides a metric definition, save it to the local markdown file references/metrics.md to build up context for future SQL generation.

Use the built-in script scripts/metrics_manager.py to append the metric:

python scripts/metrics_manager.py --name "Metric Name" --desc "Metric calculation logic or business description"

When generating SQL queries later, ALWAYS read references/metrics.md to ensure the generated SQL aligns with the saved business definitions.

Comments

Loading comments...