Data Report Generator — CSV/Excel to Word/PDF with Charts

Automatically analyze CSV or Excel files and generate professional data analysis reports with charts, summaries, and insights — output as Word (.docx) or PDF...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 209 · 1 current installs · 1 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
high confidence
Purpose & Capability
Name/description (CSV/Excel → Word/PDF reports) aligns with the instructions: code shows data loading, profiling, chart generation, and report building. Declared required binaries (python3, pip) are appropriate. No unrelated credentials, binaries, or config paths are requested.
Instruction Scope
SKILL.md contains explicit Python scripts that load user-uploaded files, profile data, generate charts, and write .png/.pdf/.docx outputs. It does not instruct reading arbitrary system files or environment variables beyond normal use. However, it tells the agent to run pip installs at runtime and example output paths (e.g., /home/claude/report.pdf) which will write files to disk. The instructions also reference additional libraries (scipy, reportlab, python-docx) in reference files even though the single pip command in the main flow does not install all of them.
Install Mechanism
There is no formal install spec (instruction-only), which is lowest static risk. The runtime instructions include a pip install command that pulls packages from PyPI (matplotlib, seaborn, pandas, openpyxl). That is an expected mechanism for a Python data/reporting skill. Two points to note: (1) the pip command uses --break-system-packages (which can allow modifying system-managed packages and may be undesirable on shared systems), and (2) several referenced snippets require extra packages (reportlab, scipy, python-docx) that are not installed in the single pip line — relying on implicit installs can cause runtime failures or make operators run additional installs manually.
Credentials
The skill requests no environment variables, no credentials, and no config paths. Its operations (file I/O, local package installs) are proportional to the documented purpose.
Persistence & Privilege
Flags show always:false and the skill is user-invocable; it does not request permanent platform-level presence or modify other skills. The skill will run code when invoked (normal for instruction-only skills).
Assessment
This skill appears coherent for generating reports from uploaded CSV/Excel files, but review these practical safety points before installing or running it: - Runs code and will write files: the instructions generate and save chart images and PDF/DOCX files locally; avoid uploading highly sensitive data unless you trust the execution environment. - Runtime pip installs: the SKILL.md tells the agent to run pip to install libraries from PyPI. That is normal for Python tools but the command uses --break-system-packages (this can alter system-managed packages). Prefer running in an isolated virtualenv/container or remove the --break-system-packages flag. - Missing dependencies: some reference snippets use libraries not included in the single pip line (e.g., reportlab, scipy, python-docx). Be prepared to install additional packages or update the dependency list so runtime failures don't prompt ad-hoc installs. - File paths: example outputs use absolute paths (e.g., /home/claude/report.pdf). Ensure the agent uses safe, writable, and expected directories rather than system-sensitive locations. If you plan to use this skill in production, run it first in a sandbox/container, review and pin the exact dependency list, remove the --break-system-packages flag, and avoid uploading confidential datasets until you verify the runtime behavior.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
analyticsvk973va6np8evszxtpwj8w71g6d82mj01automationvk973va6np8evszxtpwj8w71g6d82mj01chartsvk973va6np8evszxtpwj8w71g6d82mj01csvvk973va6np8evszxtpwj8w71g6d82mj01excelvk973va6np8evszxtpwj8w71g6d82mj01latestvk973va6np8evszxtpwj8w71g6d82mj01operationsvk973va6np8evszxtpwj8w71g6d82mj01pdfvk973va6np8evszxtpwj8w71g6d82mj01reportvk973va6np8evszxtpwj8w71g6d82mj01salesvk973va6np8evszxtpwj8w71g6d82mj01visualizationvk973va6np8evszxtpwj8w71g6d82mj01wordvk973va6np8evszxtpwj8w71g6d82mj01

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

📊 Clawdis
Binspython3, pip

SKILL.md

Data Report Generator

Transform raw CSV or Excel files into professional, insight-rich reports with charts — output as Word (.docx) or PDF.

When This Skill Activates

Trigger when user provides:

  • A CSV, Excel (.xlsx/.xls), or TSV file + asks for a report/analysis
  • A request to "visualize", "summarize", or "analyze" tabular data
  • Any mention of automated weekly/monthly reporting

If no file is uploaded yet, ask the user to upload their data file first.


Step 1: Gather Inputs

Ask for (or infer from context):

  1. Data file — CSV, Excel, TSV (required)
  2. Report format — Word (.docx) or PDF? (default: Word)
  3. Report purpose — sales analysis? operations? marketing? financial? (affects framing)
  4. Key questions — what does the user most want to understand from this data?
  5. Audience — internal team? executive summary? client-facing?
  6. Language — Chinese or English? (default: match user's language)

If purpose/questions aren't specified, proceed with a comprehensive general analysis.


Step 2: Read and Profile the Data

import pandas as pd
import numpy as np

# Support both CSV and Excel
def load_data(filepath):
    ext = filepath.rsplit('.', 1)[-1].lower()
    if ext in ['xlsx', 'xls']:
        df = pd.read_excel(filepath)
    elif ext == 'csv':
        # Try common encodings
        for enc in ['utf-8', 'gbk', 'gb2312', 'utf-8-sig']:
            try:
                df = pd.read_csv(filepath, encoding=enc)
                break
            except:
                continue
    elif ext == 'tsv':
        df = pd.read_csv(filepath, sep='\t')
    return df

df = load_data('/path/to/file')

Data Profile to Extract

  • Shape: row count, column count
  • Column types: numeric, categorical, date/time, text
  • Missing values: count and % per column
  • Basic stats: mean, median, min, max, std for numeric columns
  • Unique value counts for categorical columns
  • Date range if time columns exist
  • Obvious data quality issues

Step 3: Auto-Detect Report Type

Based on column names and data types, auto-select the analysis approach:

Data PatternReport TypeReference File
Date column + numeric valuesTime Series / Trendreferences/time-series.md
Category + numeric (sales/revenue)Sales / Performancereferences/sales-analysis.md
Multiple numeric columnsCorrelation / Distributionreferences/statistical.md
Category breakdowns onlySegmentationreferences/segmentation.md
Mixed / unknownGeneral Analysisreferences/general.md

Read the relevant reference file for chart selection and narrative guidance.

If the user specifies a report type, use that. Otherwise, auto-detect.


Step 4: Generate Charts

Install dependencies first:

pip install matplotlib seaborn pandas openpyxl --break-system-packages --quiet

Chart Generation Rules

import matplotlib
matplotlib.use('Agg')  # Non-interactive backend — ALWAYS set this
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
import seaborn as sns

# Style setup — professional look
plt.style.use('seaborn-v0_8-whitegrid')
COLORS = ['#2E86AB', '#A23B72', '#F18F01', '#C73E1D', '#3B1F2B', '#44BBA4']

def save_chart(fig, filename):
    fig.savefig(filename, dpi=150, bbox_inches='tight', facecolor='white')
    plt.close(fig)

Chart Selection Guide

Time series data → Line chart (trend) + bar chart (period comparison)

fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(df['date'], df['value'], color=COLORS[0], linewidth=2, marker='o', markersize=4)
ax.set_title('Trend Over Time', fontsize=14, fontweight='bold')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
plt.xticks(rotation=45)
save_chart(fig, 'chart_trend.png')

Category comparison → Horizontal bar chart (easier to read labels)

fig, ax = plt.subplots(figsize=(10, 6))
df_sorted = df.sort_values('value', ascending=True)
bars = ax.barh(df_sorted['category'], df_sorted['value'], color=COLORS[0])
ax.bar_label(bars, fmt='%.1f', padding=3)
ax.set_title('Performance by Category', fontsize=14, fontweight='bold')
save_chart(fig, 'chart_category.png')

Distribution → Histogram + optional KDE

fig, ax = plt.subplots(figsize=(8, 5))
ax.hist(df['value'].dropna(), bins=30, color=COLORS[0], edgecolor='white', alpha=0.8)
ax.set_title('Value Distribution', fontsize=14, fontweight='bold')
save_chart(fig, 'chart_dist.png')

Composition/share → Pie or stacked bar (prefer stacked bar for >5 categories)

fig, ax = plt.subplots(figsize=(8, 6))
sizes = df['value']
labels = df['category']
wedges, texts, autotexts = ax.pie(sizes, labels=labels, autopct='%1.1f%%',
                                   colors=COLORS, startangle=90)
ax.set_title('Composition', fontsize=14, fontweight='bold')
save_chart(fig, 'chart_pie.png')

Correlation → Heatmap

fig, ax = plt.subplots(figsize=(10, 8))
corr = df[numeric_cols].corr()
sns.heatmap(corr, annot=True, fmt='.2f', cmap='RdBu_r', center=0,
            ax=ax, square=True, cbar_kws={'shrink': 0.8})
ax.set_title('Correlation Matrix', fontsize=14, fontweight='bold')
save_chart(fig, 'chart_corr.png')

Generate 3–6 charts total — enough to be comprehensive, not overwhelming.


Step 5: Write the Report (Word .docx)

Use the docx Python library. Install: pip install python-docx --break-system-packages --quiet

Report Structure

1. Cover Page
   - Report title, data source name, date generated
2. Executive Summary (1 page)
   - 3-5 key findings in plain language
   - Most important metric highlighted
3. Data Overview
   - Dataset dimensions, time range, data quality notes
4. [Core Analysis Sections — vary by report type]
   - Each section: narrative paragraph + chart + key takeaways
5. Key Insights & Recommendations
   - Actionable bullet points
6. Appendix: Data Statistics Table

Docx Code Pattern

from docx import Document
from docx.shared import Inches, Pt, RGBColor
from docx.enum.text import WD_ALIGN_PARAGRAPH
from docx.enum.table import WD_TABLE_ALIGNMENT
import datetime

doc = Document()

# Page setup
section = doc.sections[0]
section.page_width = Inches(8.5)
section.page_height = Inches(11)
section.left_margin = Inches(1)
section.right_margin = Inches(1)
section.top_margin = Inches(1)
section.bottom_margin = Inches(1)

# Title style
def add_title(doc, text, level=1):
    heading = doc.add_heading(text, level=level)
    heading.alignment = WD_ALIGN_PARAGRAPH.LEFT
    run = heading.runs[0]
    run.font.color.rgb = RGBColor(0x2E, 0x86, 0xAB)  # Brand blue
    return heading

# Body text
def add_paragraph(doc, text):
    p = doc.add_paragraph(text)
    p.paragraph_format.space_after = Pt(6)
    run = p.runs[0] if p.runs else p.add_run()
    run.font.size = Pt(11)
    return p

# Insert chart image
def add_chart(doc, image_path, caption, width=6.0):
    doc.add_picture(image_path, width=Inches(width))
    last_paragraph = doc.paragraphs[-1]
    last_paragraph.alignment = WD_ALIGN_PARAGRAPH.CENTER
    cap = doc.add_paragraph(caption)
    cap.alignment = WD_ALIGN_PARAGRAPH.CENTER
    cap.runs[0].font.size = Pt(9)
    cap.runs[0].font.color.rgb = RGBColor(0x66, 0x66, 0x66)

# Stats table
def add_stats_table(doc, df):
    stats = df.describe().round(2)
    table = doc.add_table(rows=len(stats)+1, cols=len(stats.columns)+1)
    table.style = 'Table Grid'
    # Header row
    table.cell(0, 0).text = 'Metric'
    for j, col in enumerate(stats.columns):
        table.cell(0, j+1).text = str(col)
    # Data rows
    for i, idx in enumerate(stats.index):
        table.cell(i+1, 0).text = str(idx)
        for j, col in enumerate(stats.columns):
            table.cell(i+1, j+1).text = str(stats.loc[idx, col])

# Save
doc.save('/home/claude/report_output.docx')

Step 6: Convert to PDF (if requested)

# Use LibreOffice for conversion
python /path/to/skills/docx/scripts/office/soffice.py \
  --headless --convert-to pdf /home/claude/report_output.docx \
  --outdir /home/claude/

If LibreOffice is unavailable, fall back to reportlab:

pip install reportlab --break-system-packages --quiet

See references/pdf-fallback.md for reportlab-based PDF generation.


Step 7: Output to User

  1. Copy final file to /mnt/user-data/outputs/
  2. Use present_files tool to share it
  3. Include a brief summary in chat:
    • What data was analyzed
    • How many charts were generated
    • Top 3 insights found
    • Any data quality issues to be aware of

Output Quality Checklist

Before delivering, verify:

  • Charts have titles, axis labels, and readable fonts
  • No blank/empty chart images
  • Report has executive summary with plain-language findings
  • Data quality issues are noted (missing values, anomalies)
  • File opens cleanly (test docx structure validity)
  • Chinese text renders correctly if using Chinese (use SimHei or Microsoft YaHei font in matplotlib)
  • All chart image files exist before embedding in docx

Handling Chinese / CJK Text in Charts

import matplotlib
# Set Chinese-compatible font
matplotlib.rcParams['font.sans-serif'] = ['SimHei', 'Microsoft YaHei', 
                                           'PingFang SC', 'DejaVu Sans']
matplotlib.rcParams['axes.unicode_minus'] = False  # Fix minus sign rendering

If font not available on system:

pip install matplotlib --break-system-packages
# Download and install a CJK font
apt-get install -y fonts-wqy-zenhei 2>/dev/null || true

Edge Cases

Large datasets (>100k rows): Sample or aggregate before visualization. Note sampling in report.

Messy/dirty data: Document cleaning steps in the report's "Data Overview" section. Show before/after counts.

Single column data: Generate distribution analysis only. Note limitation.

All categorical data: Focus on frequency analysis and cross-tabulations. No numeric charts.

Date parsing issues: Try multiple date formats (dayfirst=True, yearfirst=True, infer_datetime_format=True).

Multiple sheets in Excel: Ask user which sheet(s) to analyze, or analyze all and create multi-section report.


Reference Files

  • references/time-series.md — Trend analysis, seasonality detection, period-over-period comparisons
  • references/sales-analysis.md — Sales/revenue specific charts, KPI calculations, funnel analysis
  • references/statistical.md — Correlation, distribution, outlier detection, regression
  • references/segmentation.md — Category breakdowns, cohort analysis, ranking
  • references/general.md — General-purpose analysis for unknown data types
  • references/pdf-fallback.md — PDF generation with reportlab when LibreOffice unavailable
  • references/chart-styling.md — Advanced chart styling, brand colors, annotation patterns

Files

8 total
Select a file
Select a file to preview.

Comments

Loading comments…