claude-intel-monitor

Other

Detect intelligence degradation in Claude, GPT, and DeepSeek using 30 standardized Chinese benchmark questions across Math, Reasoning, and Code. Born from real "降智" (degradation) signals reported by Chinese developer communities.

Install

openclaw skills install claude-intel-monitor

claude-intel-monitor — AI 降智检测工具

Detect intelligence degradation in AI models with standardized benchmarks. 30 curated Chinese questions across Math, Reasoning, and Code — designed around real degradation patterns from the Chinese developer community.

痛点来源 (Pain Signal Origins)

"Claude/GPT 降智" was a top-3 hot topic during April-May 2026 Chinese developer community scans:

  • CSDN: Multiple quantified analyses demonstrating Claude Opus 4.6 reasoning degradation (-67% depth, +98% hallucination)
  • V2EX claudecode node: 12-reply hot thread on Claude Code behavior changes
  • V2EX deepseek node: 4 posts on frequent service disruptions

Quick Start

pip install claude-intel-monitor

# Test a model
claude-intel-monitor test --model claude-sonnet-4 --provider anthropic

# Set baseline for degradation detection
claude-intel-monitor baseline --model claude-sonnet-4

# View history
claude-intel-monitor history

# Continuous watch mode
claude-intel-monitor watch --model claude-sonnet-4 --provider anthropic --interval 6h

Benchmark Structure

30 questions, 3 dimensions:

DimensionCountWeightDetection Target
Math101.0xMathematical reasoning, hallucination tendency
Reasoning101.2xLogical reasoning, reduced safety awareness
Code101.3xCode quality, architectural degradation

All Chinese. Each answer validated by deterministic check functions (no AI grading bias).

Featured Baseline: DeepSeek 91.1%

🧠 Testing deepseek-chat via deepseek — 30 questions

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃      91.1%  ██████████████████░░  ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

📊 DeepSeek first live baseline: 27/30 (91.1%)

Related Tools