Skill flagged — suspicious patterns detected

ClawHub Security flagged this skill as suspicious. Review the scan results before using.

Augent

v1.4.0

The audio & video layer for agents. 22 local MCP tools. No cloud, no API keys.

1· 47·0 current·0 all-time
MIT-0
Download zip
LicenseMIT-0 · Free to use, modify, and redistribute. No attribution required.
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Suspicious
medium confidence
Purpose & Capability
Name/description match required binaries (ffmpeg, yt-dlp, aria2c) and the declared install (uv/pip). The listed 22 tools (download, transcribe, TTS, diarization, Demucs, etc.) are coherent with an 'audio & video layer' skill.
!
Instruction Scope
SKILL.md instructs the agent to read/write many user paths (~/Downloads/, ~/Desktop/, ~/.augent/memory/, ~/.augent/auth.json) and to integrate with an Obsidian vault. These filesystem reads/writes and the explicit need for a user-configured auth token are not declared in the skill's required config paths or environment variables. The instructions allow the agent to download arbitrary remote content (yt-dlp + aria2c) and save/export clips locally — expected for a downloader tool but a broad action surface that should be explicitly declared.
Install Mechanism
Installers are uv (ClawHub-style) and pip (augent or augent[all]). These are reasonable and expected for Python-based tools; however the optional '[all]' extras likely pull heavy ML/audio dependencies (faster-whisper, sentence-transformers, pyannote, Demucs) which can be large and require GPUs or CPU resources. No raw URL downloads or obscure external archives were specified.
!
Credentials
skill.json and registry metadata list no required env vars or config paths, yet SKILL.md expects a user-managed auth token at ~/.augent/auth.json for X/Twitter Spaces and persistent transcription memory under ~/.augent/memory/. The lack of declared config/credential requirements is an incoherence — users should be informed that the skill will create/read these files. No secrets were requested via env vars, but file-based tokens are implied and not surfaced in metadata.
Persistence & Privilege
always is false (normal). The skill will write persistent data to user directories (Downloads, Desktop, ~/.augent) and optionally an Obsidian vault; this is normal for a media processing tool but increases persistence on disk. Autonomous invocation is allowed by default — combine that with the download capability and filesystem writes if you let the agent act without supervision.
What to consider before installing
This skill is largely coherent with its purpose, but it hides important runtime behavior in SKILL.md. Before installing: (1) Confirm you trust the augent package source (check GitHub repo and PyPI/uv listing) and review the package contents because the '[all]' extras pull many heavy ML/audio dependencies. (2) Be aware the skill will download arbitrary remote media (yt-dlp + aria2c) and save files to ~/Downloads and ~/Desktop and store transcription data under ~/.augent — these paths are not declared in the manifest. (3) If you plan to use Twitter/X Spaces recording, the tool expects a user-supplied auth token in ~/.augent/auth.json; ensure you understand where that token comes from and do not store high-privilege secrets there. (4) Consider running the tool in a sandboxed environment or isolated user account if you want to avoid giving the agent the ability to download arbitrary content or write into your main home directories. (5) If you require higher assurance, inspect the package installed by uv/pip (source files) before granting the agent permission to invoke augent-mcp autonomously.

Like a lobster shell, security has layers — review code before you run it.

latestvk9703chmv3qx7504wbmvmpp421842h9f

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

Runtime requirements

🎙 Clawdis
OSmacOS · Linux · Windows
Binsaugent-mcp, ffmpeg, yt-dlp, aria2c

Install

Install augent (uv)
Bins: augent-mcp, augent, augent-web
uv tool install augent

SKILL.md

Augent — Audio & Video Intelligence for AI Agents

Augent is an MCP server that gives your agent 22 tools for audio and video intelligence. Download from 1000+ sites via yt-dlp and aria2c, transcribe in 99 languages via faster-whisper, search by keyword or meaning via sentence-transformers, take notes, identify speakers via pyannote-audio, detect chapters, separate audio via Demucs v4, export clips, extract visual frames, record X/Twitter Spaces (requires user-configured auth token in ~/.augent/auth.json), and generate speech via Kokoro TTS. All processing runs locally. Downloads are saved to ~/Downloads/, notes and clips to ~/Desktop/, transcription memory to ~/.augent/memory/.

Config

{
  "mcpServers": {
    "augent": {
      "command": "augent-mcp"
    }
  }
}

If augent-mcp is not in PATH, use python3 -m augent.mcp as the command instead.

Install

Install via the ClawHub install button above, or use uv tool install augent for the base package or uv tool install "augent[all]" for all features. FFmpeg is required for audio processing.

Tools

Augent exposes 22 MCP tools:

Core

ToolDescription
download_audioDownload audio from video URLs at maximum speed. Supports YouTube, Vimeo, TikTok, Twitter/X, SoundCloud, and 1000+ sites. Uses aria2c multi-connection + concurrent fragments.
transcribe_audioFull transcription of any audio file with per-segment timestamps. Returns text, language, duration, and segments. Cached by file hash.
search_audioSearch audio for keywords. Returns timestamped matches with context snippets. Supports clip export.
deep_searchSemantic search — find moments by meaning, not just keywords. Uses sentence-transformers embeddings.
search_memorySearch across ALL stored transcriptions in one query. Keyword or semantic mode.
take_notesAll-in-one: download audio from URL, transcribe, and save formatted notes. Supports 5 styles: tldr, notes, highlight, eye-candy, quiz.
clip_exportExport a video clip from any URL for a specific time range. Downloads only the requested segment.

Analysis

ToolDescription
chaptersAuto-detect topic chapters with timestamps using embedding similarity.
search_proximityFind where two keywords appear near each other (e.g., "startup" within 30 words of "funding").
identify_speakersSpeaker diarization — identify who speaks when. No API keys required.
separate_audioIsolate vocals from music/noise using Meta's Demucs v4. Feed clean vocals into transcription.
batch_searchSearch multiple audio files in parallel. Ideal for podcast libraries or interview collections.

Utilities

ToolDescription
text_to_speechConvert text to natural speech using Kokoro TTS. 54 voices, 9 languages. Runs in background.
list_filesList media files in a directory with size info.
list_memoriesBrowse all stored transcriptions by title, duration, and date.
memory_statsView memory statistics (file count, total duration).
clear_memoryClear the transcription memory to free disk space.
tagAdd, remove, or list tags on transcriptions. Broad topic categories for organizing memories.
highlightsExport the best moments from a transcription. Auto mode picks top moments; focused mode finds moments matching a topic.
visualExtract visual context from video at moments that matter. Query, auto, manual, and assist modes. Frames saved to Obsidian vault.
rebuild_graphRebuild Obsidian graph view data for all transcriptions. Migrates files, computes wikilinks, generates MOC hubs.
spacesDownload or live-record X/Twitter Spaces. Start, check status, or stop recordings.

Usage Examples

Take notes from a video

"Take notes from https://youtube.com/watch?v=xxx"

The agent calls take_notes which downloads, transcribes, and returns formatted notes. One tool call does everything.

Search a podcast for topics

"Search this podcast for every mention of AI regulation" — provide the file path or URL.

The agent uses search_audio for exact keyword matches, or deep_search for semantic matches (finds relevant discussion even without exact words).

Transcribe and identify speakers

"Transcribe this meeting recording and tell me who said what"

The agent calls transcribe_audio then identify_speakers to label each segment by speaker.

Search across all transcriptions

"Search everything I've ever transcribed for mentions of funding"

The agent uses search_memory to search across all stored transcriptions without needing a file path.

Export a clip

"Clip the part where they talk about pricing"

The agent uses search_audio or deep_search to find the moment, then clip_export to extract just that segment.

Separate vocals from noisy audio

"This recording has music in the background, clean it up and transcribe"

The agent calls separate_audio to isolate vocals, then transcribe_audio on the clean vocals track.

Generate speech from text

"Read these notes aloud"

The agent calls text_to_speech to generate an MP3 with natural speech. Supports multiple voices and languages.

Note Styles

When using take_notes, the style parameter controls formatting:

StyleDescription
tldrShortest possible summary. One screen. Bold key terms.
notesClean sections with nested bullets (default).
highlightNotes with callout blocks for key insights and blockquotes with timestamps.
eye-candyMaximum visual formatting — callouts, tables, checklists, blockquotes.
quizMultiple-choice questions with answer key.

Model Sizes

tiny is the default and handles nearly everything. Only use larger models for heavy accents, poor audio quality, or maximum accuracy needs.

ModelSpeedAccuracy
tinyFastestExcellent (default)
baseFastExcellent
smallMediumSuperior
mediumSlowOutstanding
largeSlowestMaximum

File Paths

Augent reads and writes to these locations on your machine:

PathPurpose
~/Downloads/Default directory for downloaded audio files
~/Desktop/Default directory for notes, clips, and TTS output
~/.augent/memory/transcriptions.dbSQLite database for persistent transcription memory
~/.augent/memory/transcriptions/Markdown files for each stored transcription
~/.augent/config.yamlUser configuration (optional)
~/.augent/auth.jsonTwitter/X authentication cookies for Spaces recording (optional, user-created)

If Obsidian is installed, visual frames are saved to the Obsidian vault's External Files/visual/ directory. The vault path is auto-detected from Obsidian's config.

Network Access

Network access is used for two purposes only:

  1. Downloading media from user-provided URLs via yt-dlp and aria2c
  2. Downloading ML models on first use (Whisper, sentence-transformers, pyannote, Demucs, Kokoro) from Hugging Face

No telemetry. No background network activity. No data is uploaded.

ML Dependencies

The augent[all] install includes these local ML components:

ComponentPurposeSize
faster-whisperSpeech-to-text transcription~75MB (tiny model)
sentence-transformersSemantic search, auto-tagging, chapter detection~90MB
pyannote-audioSpeaker diarization~29MB
Demucs v4Audio source separation (vocals from noise)~80MB
KokoroText-to-speech (54 voices, 9 languages)~200MB

All models run locally. None require API keys or cloud services.

Requirements

  • Python 3.10+
  • FFmpeg (audio processing)
  • yt-dlp + aria2c (for audio downloads)

Links

Files

3 total
Select a file
Select a file to preview.

Comments

Loading comments…