universal-data-analyst-en

v1.0.3

Performs automated, LLM-driven data analysis including loading, validation, method selection, script generation, execution, and comprehensive reporting for d...

⭐ 0· 145·0 current·0 all-time

byyamaz@yamaz49

OpenClaw Prompt Flow

Install with OpenClaw

Best for remote or guided setup. Copy the exact prompt, then paste it into OpenClaw for yamaz49/universal-data-analyst-en.

Previewing Install & Setup.

Prompt PreviewInstall & Setup

Install the skill "universal-data-analyst-en" (yamaz49/universal-data-analyst-en) from ClawHub.
Skill page: https://clawhub.ai/yamaz49/universal-data-analyst-en
Keep the work scoped to this skill only.
After install, inspect the skill metadata and help me finish setup.
Use only the metadata you can verify from ClawHub; do not invent missing requirements.
Ask before making any broader environment changes.

Command Line

CLI Commands

Use the direct CLI path if you want to install manually and keep every step visible.

OpenClaw CLI

Bare skill slug

openclaw skills install universal-data-analyst-en

ClawHub CLI

Package manager switcher

npx clawhub@latest install universal-data-analyst-en

Security Scan

Capability signals

CryptoCan make purchases

These labels describe what authority the skill may exercise. They are separate from suspicious or malicious moderation verdicts.

VirusTotal

Suspicious

View report →

OpenClaw

Suspicious

medium confidence

ℹ

Purpose & Capability

Name/description match the included modules (data loader, validator, orchestrator, LLM prompt generator, report generator). The code produces prompts for LLMs and coordinates a multi-step analysis pipeline, which is coherent with the stated purpose. Minor mismatch: README/SKILL.md includes example calls to an LLM client (Anthropic/Claude) but the shipped LLM module returns prompts rather than performing network calls and the skill declares no required env vars/credentials for LLM access.

Instruction Scope

The orchestrator generates full Python analysis scripts (via LLM prompts) and then executes them (the orchestrator imports subprocess and contains step execution logic). Executing code generated by an LLM on the user's machine is expected for this tool's purpose but is a high-risk action: generated scripts can contain arbitrary file I/O, shell/OS calls, or network operations and thus may exfiltrate data or modify the system. The SKILL.md and code instruct saving prompt files and calling an LLM externally — but the skill also supports an autonomous flow that can generate and run code. There are no enforced sandboxing or restrictions in the provided code.

✓

Install Mechanism

There is no install spec (instruction-only skill with packaged Python modules). Nothing is downloaded at install time, so no arbitrary remote code is pulled during installation. The runtime will write output and prompt files to local output directories.

ℹ

Credentials

The skill declares no required environment variables or credentials. However, documentation/examples reference calling external LLM APIs (Anthropic/Claude) which would require API keys if you choose to integrate — these keys are not managed by the skill. The shipped code itself does not appear to read unrelated system credentials or config paths.

✓

Persistence & Privilege

always:false and no special persistence or modifications to other skills/configs. The skill creates session/output directories within the working directory; it does not request or claim system-wide privileges. Autonomous invocation is allowed by platform default, which combined with script execution increases blast radius but is not itself an unusual setting.

What to consider before installing

This skill is coherent with its stated purpose, but it generates Python analysis scripts via LLM prompts and can execute those scripts locally. Before installing or running: 1) Do NOT run this on sensitive or production systems without reviewing generated scripts first. 2) Inspect any generated analysis_script.py for network, subprocess, or filesystem operations (look for imports like requests, socket, subprocess, os.system, eval/exec, urllib, ftplib, paramiko). 3) Prefer running the orchestration and script execution inside an isolated environment (ephemeral VM, container, or sandbox) with limited network and file access. 4) If you will call external LLMs, keep API keys separate and only use trusted endpoints; the skill does not manage credentials. 5) Consider using the human-in-the-loop mode (generate prompts and scripts but manually review/execute) rather than fully autonomous execution. If you want me to, I can: (a) scan the full repository for occurrences of subprocess/requests/os.system/eval/exec/network endpoints, or (b) point to specific lines/functions to review before running.

Like a lobster shell, security has layers — review code before you run it.

latestvk977e8k81ad36c0kgs94265tj9848544

145downloads

0stars

4versions

Updated 3w ago

v1.0.3

MIT-0

Universal Data Analyst (通用数据分析专家)

Introduction

An intelligent data analysis skill based on Data Ontology. Unlike keyword-based approaches, this skill uses LLM reasoning for every analysis, automatically identifying data types, selecting analysis methods, generating scripts, and outputting reports.

Supports both economic data (retail, subscription, finance, etc.) and non-economic data (scientific measurements, social networks, text, etc.), handling multiple formats including CSV, Excel, Parquet, JSON, and more.

How to Trigger

Simply upload a data file or send any of these types of messages:

"Help me analyze this data"
"What patterns are in this CSV?"
"Explore this dataset"
"Check the data quality for me"
Directly upload .csv / .xlsx / .parquet / .json files

Core Design: Four-Layer Analysis Framework

Layer 1: Data Ontology
        ↓  What kind of existence is this? Entity type? Generation mechanism?
Layer 2: Problem Typology
        ↓  Descriptive / Diagnostic / Predictive / Prescriptive / Causal?
Layer 3: Methodology Mapping
        ↓  Match domain-recognized analysis frameworks
Layer 4: Validation & Output
           Data quality report + Analysis scripts + HTML/MD reports

Each layer invokes LLM reasoning without any hardcoded rules.

Analysis Workflow (7 Steps)

Step	Content	Description
1	Data Loading	Auto-recognize formats, support multiple file types
2	Ontology Recognition	LLM judges entity type and generation mechanism
3	Quality Validation	Auto-detect missing values, outliers, duplicates, output quality score
4	Plan Generation	LLM selects analysis framework and path based on user intent
5	Script Generation	LLM generates executable Python analysis scripts
6	Execute Analysis	Run scripts, generate charts and numerical results
7	Comprehensive Report	Output HTML + Markdown dual-format reports

Flow Health Monitoring (NEW)

Each step has status tracking and error handling:

Step Dependency Check - Automatically prevents subsequent steps when prerequisites fail
Clear Error Messages - Provides explicit failure reasons and fix suggestions
Flow Health Report - Outputs complete execution status and issue summary

If a step fails, you'll see:

⚠️ Flow Interrupted!
   Reason: Critical step 'Data Loading' failed: Encoding error

Fix Suggestions:
  1. File encoding may not be UTF-8, try manually specifying encoding parameter
  2. Common Chinese encodings: gbk, gb2312, gb18030

Supported Data Types

Economic Data

Data Characteristics	Recognized As	Auto-matched Framework
Orders + Price + SKU	Retail Economy	Value Chain / ABC-XYZ / RFM
User + Subscription Cycle + Churn	Subscription Economy	LTV / Cohort / Retention Curves
Click / Add-to-cart / Purchase Events	Attention Economy	Funnel Analysis / AARRR
GMV + Platform Matching	Commission Economy	Two-sided Network Effects / Unit Economics
Position + Skills + Salary	Labor Market	Skill Premium / Experience Elasticity
OHLCV Price Data	Financial Time Series	Technical Analysis / Volatility Models

Non-Economic Data

Data Type	Auto-matched Framework
Sensors / Time Series Continuous	Time Series Decomposition, Extreme Value Analysis
Social / Network Relationship	Centrality Analysis, Community Detection
Geographic / Spatial	Spatial Autocorrelation, Hotspot Analysis
Text Corpus	Topic Modeling, Sentiment Analysis
Biomedical	Survival Analysis, Differential Expression

Supported File Formats

CSV / TSV (.csv, .tsv, .txt) - Auto encoding detection, supports utf-8, gbk, latin1, etc.
Excel (.xlsx, .xls)
Parquet (.parquet, .pq)
JSON (.json)
SQL Database (via connection string)

Encoding Fault Tolerance

CSV loading automatically tries multiple encodings:

Auto encoding detection (if chardet library available)
Fallback encodings: utf-8, utf-8-sig, gbk, gb2312, gb18030, latin1, etc.
Engine fallback: Auto-switches to Python engine when C engine fails, skipping corrupted rows

Output Contents

Each analysis generates:

session_YYYYMMDD_HHMMSS/
├── step2_ontology_prompt.txt     # Ontology recognition prompts (reusable)
├── step3_validation_report.json  # Data quality report
├── step3_cleaning_report.txt     # Data cleaning recommendations
├── step4_planning_prompt.txt     # Analysis planning prompts (reusable)
├── step5_script_prompt.txt       # Script generation prompts (reusable)
├── analysis_report.html          # Comprehensive HTML report (with charts)
├── analysis_report.md            # Markdown report
└── charts/                       # All analysis charts (PNG)

Usage Examples

Example 1: Analyzing E-commerce Sales Data

User: Help me analyze this sales data, want to know which products sell well and which customers are high-value

[Upload orders.csv]

Skill automatically:

Recognizes as "Retail Economy × Transaction/Event Data"
Selects RFM Customer Value Analysis + ABC Product Classification framework
Generates and executes analysis scripts
Outputs customer segmentation distribution, product sales ranking, RFM heatmap, and HTML report

Example 2: Analyzing User Behavior Logs

User: This is our App's user behavior log, want to see the user conversion funnel

[Upload events.csv]

Skill automatically:

Recognizes as "Attention/Conversion Economy × Event Sequence Data"
Selects Funnel Analysis + Session Sequence Mining framework
Outputs conversion rates at each step, churn node analysis, user path Sankey diagram

Example 3: Analyzing Meteorological Observation Data

User: Help me analyze this weather station observation record, understand temperature and precipitation patterns

[Upload weather.csv]

Skill automatically:

Recognizes as "Earth Science × Time Series/Trajectory Data × Instrument Measurement"
Selects Time Series Decomposition + Seasonality Analysis + Extreme Value Statistics framework
Outputs trend charts, seasonal decomposition charts, outlier reports

Dependencies

pandas >= 1.3
numpy >= 1.21
matplotlib >= 3.4
seaborn >= 0.11
scipy >= 1.7
openpyxl >= 3.0   # Excel support
chardet >= 4.0    # Auto encoding detection (optional but recommended)
pyarrow >= 6.0    # Parquet support (optional)
sqlalchemy >= 1.4 # SQL support (optional)

Version

v1.1.0 · Author: Claude · License: CC BY-NC-SA 4.0

v1.1.0 Updates (2026-03-23)

Flow Health Monitoring - Added step status tracking, dependency checks, error messages
Enhanced Encoding Fault Tolerance - Auto-try multiple encodings for CSV/TSV (utf-8, gbk, latin1, etc.)
Engine Fallback - Auto-switches to Python engine when C engine fails, skipping corrupted rows

v1.0.0

Initial version: Four-layer analysis framework + 7-step analysis workflow

Comments

Loading comments...