Minutes Taker

Other

Convert meeting transcripts into structured minutes with decision tracking, task assignment, three-tier summaries, and cross-meeting correlation.

Install

openclaw skills install minutes-taker

AI Meeting Minutes (minutes-taker)

Convert meeting transcripts or text into structured, actionable minutes. The skill extracts decisions, action items, risks, and key dates, generates three-tier summaries (one-liner → three-para → full), and tracks decisions and todos across meetings.

Core strengths: Decision chain tracking across meetings, todo lifecycle management (extract → assign → track → remind), and three-level summaries for different consumption contexts.

Audio support: ASR (speech-to-text) is available via plugin backends. See ASR Backends below.

Quick Start

clawhub run minutes-taker --input-type text \
  --content "Zhang: Today we discuss the payment module. Li: I suggest using Stripe. Wang: Agreed. Zhang: OK, Li to deliver the proposal by Friday."

This produces structured minutes with decisions, todos, and a three-level summary.

Features

FeatureDescription
Input ModesPlain text, chat logs, structured forms. Audio via ASR backend (see below)
ASR (Speech-to-Text)Pluggable: whisper (offline) or SpeechRecognition (Google API). No diarization by default
6 ExtractorsDecisions, Todos, Dates, Risks, Ideas, Data Points
3-Level SummariesL1 one-liner, L2 three-paragraph, L3 full minutes
Decision ChainTrack decisions across meetings with evolution history
Todo LifecycleAuto-extract, assign, deadline parse, priority infer, track across meetings
Cross-meeting LinksDetect recurring topics, link related decisions and todos
Multiple OutputsMarkdown, text, HTML, feishu, notion (via extensions)

ASR Backends

The asr.py module auto-detects available backends at runtime:

BackendQualityOfflineRequirements
whisper (openai-whisper)⭐⭐⭐ High✅ Yespip install openai-whisper
speech_recognition (Google API)⭐⭐ Medium❌ Nopip install SpeechRecognition (pre-installed)

Transcription is attempted in priority order: whisper → speech_recognition. Speaker diarization is NOT performed; the system labels all text as a single speaker. For multi-speaker audio, provide a participants list or use post-processing.

Input Format

{
  "input": {
    "type": "text",
    "content": "Zhang: Today we discuss...",
    "format": "chat_log"
  },
  "meeting_context": {
    "title": "Q2 Product Roadmap Review",
    "date": "2026-06-14",
    "participants": [
      {"name": "Zhang San", "role": "PM"},
      {"name": "Li Si", "role": "Frontend Dev"}
    ],
    "agenda": ["Payment module", "Growth tools"]
  },
  "options": {
    "summary_level": "full",
    "extract_todos": true,
    "extract_decisions": true,
    "output_format": "markdown"
  }
}

Sample Prompts

Prompt 1: Text-based Meeting Minutes (Quick Start)

clawhub run minutes-taker --input-type text \
  --content "Zhang: Today we discuss the payment module. Li: I suggest using Stripe. Wang: Agreed. Zhang: OK, Li to deliver the proposal by Friday." \
  --title "Payment Module Discussion" \
  --participants "Zhang San (PM), Li Si (FE), Wang Wu (BE)"

Expected output: Structured minutes with:

  • 📌 Summary: Payment module discussion → Stripe chosen, Li to deliver
  • ✅ Todos: Li Si - deliver Stripe proposal by Friday (🔴 High)
  • 📊 Decisions: Use Stripe SDK (Zhang proposed, unanimous)
  • L1: "Decided to use Stripe for payment module; Li to submit proposal by Friday"
  • L2/L3: Full expanded minutes

Prompt 2: Audio File Processing

clawhub run minutes-taker --input ./meeting-2026-06-14.m4a \
  --title "Q2 Product Roadmap Review" \
  --participants @team.json

Expected output: Transcribed text with structured minutes (decisions, todos, risks, summary). Requires an ASR backend (see ASR Backends). If no backend is available, returns a clear error guiding installation.

Prompt 3: Todo Tracking

clawhub run minutes-taker todos --since last_meeting

Expected output:

📋 Previous Meeting Todo Tracking (2026-06-07)
✅ Complete (3/5):
  ✅ Li Si · Payment frontend tech proposal (PR #2341 submitted)
⏳ In Progress (1/5):
  ⏳ Zhao Liu · Growth tool prototype (60%, due 06/16)
❌ Overdue (1/5):
  ❌ Zhang San · Competitive analysis report (3 days overdue)

Prompt 4: Decision Chain

clawhub run minutes-taker decisions --topic "Payment Module"

Expected output:

📜 Decision Chain: Payment Module Refactor
06/01 Weekly: "We need to refactor payment module" (Zhang)
06/07 Tech Review: Chose Stripe SDK (Wang proposed, 3:2 passed)
06/14 Roadmap Review: Launch date set to July 20, added stress test → [This meeting]

Prompt 5: Three-Level Summary

clawhub run minutes-taker --input-type text --content "$(cat transcript.txt)" \
  --summary-level three_para --output-format text

Expected output: Concise three-paragraph summary ready for WeChat/email.

First-Success Path

Goal: Structured minutes from 3 lines of dialogue within 30 seconds.

Step 1: clawhub install minutes-taker
Step 2: clawhub run minutes-taker --input-type text \
  --content "Zhang: Today we discuss payment. Li: I suggest Stripe. Wang: OK. Zhang: Li, make proposal by Friday."
Step 3: Internal pipeline:
  a. input.py parses text, identifies speakers
  b. segmenter.py (single topic, no split needed)
  c. extractor.py extracts: 1 decision, 1 todo, 0 risks
  d. summarizer.py generates L1/L2/L3 summaries
  e. formatter.py renders Markdown
Step 4: User sees structured minutes with decision + todo
Step 5: Next step: try audio → requires ASR backend (see ASR Backends section)

Architecture

minutes-taker/
├── SKILL.md
├── scripts/
│   ├── asr.py             # ASR audio transcription (pluggable backends)
│   ├── input.py           # Input parsing (audio/text/chat)
│   ├── segmenter.py       # Topic segmentation
│   ├── extractor.py       # Decision/todo/risk/idea/data extraction
│   ├── summarizer.py      # Three-level summary generation
│   ├── decision_chain.py  # Cross-meeting decision tracking
│   ├── todo_tracker.py    # Todo lifecycle management
│   ├── formatter.py       # Minutes formatting
│   └── storage.py         # Local minutes storage & retrieval
└── references/
    └── examples.json       # Sample inputs/outputs

Pipeline

Input (text/audio/chat)
    │
    ▼
input.py ──► Parsed Input (content + speakers)
    │
    ▼
segmenter.py ──► Topic Segments
    │
    ▼
extractor.py ──► Decisions, Todos, Risks, Ideas, Dates, Data
    │
    ▼
summarizer.py ──► L1, L2, L3 Summaries
    │
    ▼
formatter.py ──► Markdown / Text / HTML
    │
    ▼
storage.py ──► Saved to ~/.openclaw/data/minutes-taker/

Error Handling

CodeScenarioAction
E001Audio file not foundError + path check
E002Unsupported audio formatList supported formats + convert via ffmpeg
E003ASR processing failureError + offer manual text input; list available backends
E004LLM timeoutFall back to basic template
E005LLM format errorRetry 1x, then return raw text
E006Export API unavailableSave locally + retry hint
E007History data unavailableSkip cross-refs, generate current
E008Empty participants listAuto-detect from content

Security

  • ASR backends vary: whisper is fully offline (no network upload); Google Speech API sends audio to Google servers
  • Privacy notice: When using speech_recognition backend, audio data is transmitted to Google for transcription
  • Local storage: Minutes stored at ~/.openclaw/data/minutes-taker/ with configurable directory
  • File permissions: Minutes files default to 600 (user-only read/write)
  • Sensitive detection: Marks potential sensitive content (salary, HR topics) with ⚠️
  • LLM context splitting: Sends meeting text by topic segments, not the full transcript at once

Dependencies

  • Python 3.10+
  • ffmpeg (for audio conversion)
  • Optional: openai-whisper for local offline ASR
  • Pre-installed: SpeechRecognition for Google Speech API ASR