Agent Debugger
Debug AI agent issues systematically. Covers tool failures, infinite loops, context overflow, rate limits, and performance bottlenecks. Use when agents misbe...
Like a lobster shell, security has layers — review code before you run it.
License
SKILL.md
Agent Debugger
Systematic debugging for AI agent issues. When your agent misbehaves, this skill helps identify and fix the problem.
Common Agent Problems
1. Infinite Loops
Symptoms:
- Agent repeats same action
- Gets stuck in a pattern
- Never completes task
Diagnosis:
Agent log shows:
- Same tool called 10+ times
- Same output format repeated
- No progress between iterations
Fixes:
Add iteration limit:
{
"maxIterations": 5,
"onLimit": "ask_user"
}
Add explicit stop condition:
In your instructions, add:
"If you've tried the same approach 3 times without success, stop and ask the user for guidance."
2. Tool Failures
Symptoms:
- Tool returns error
- Tool times out
- Tool not found
Diagnosis:
Check:
- Tool exists in available_tools
- Parameters match tool schema
- Tool has required permissions
- Rate limits not exceeded
Fixes:
Validate parameters first:
# Before calling tool
required_params = tool.get("required", [])
for param in required_params:
if param not in args:
raise ValueError(f"Missing required parameter: {param}")
Add retry logic:
{
"retries": 3,
"retryDelay": 1000,
"retryOn": ["rate_limit", "timeout", "5xx"]
}
3. Context Overflow
Symptoms:
- "Context length exceeded" error
- Agent forgets earlier conversation
- Truncated outputs
Diagnosis:
Check context window:
- Current tokens vs max tokens
- Number of messages in history
- Size of file contents loaded
Fixes:
Use memory efficiently:
- Load only relevant files
- Use offset/limit for large files
- Summarize long conversations
- Clear old context periodically
Compress context:
# Instead of full file
content = read("file.txt", offset=1, limit=100)
# Use memory_search for specific info
results = memory_search("important decision")
4. Rate Limiting
Symptoms:
- "Rate limit exceeded" error
- Requests blocked
- 429 status codes
Diagnosis:
Check:
- API rate limits (requests per minute/hour)
- Token limits (tokens per minute)
- Concurrent request limits
- Time until reset
Fixes:
Add backoff:
import time
import random
def call_with_backoff(func, max_retries=5):
for attempt in range(max_retries):
try:
return func()
except RateLimitError as e:
wait = (2 ** attempt) + random.random()
time.sleep(wait)
raise Exception("Max retries exceeded")
Queue requests:
from queue import Queue
from threading import Thread
request_queue = Queue()
def process_queue():
while True:
task = request_queue.get()
result = execute(task)
request_queue.task_done()
time.sleep(0.1) # Rate limit: 10 req/s
5. Memory Issues
Symptoms:
- Agent doesn't remember previous context
- MEMORY.md not loaded
- Memory files not found
Diagnosis:
Check:
- MEMORY.md exists
- memory/ directory exists
- Files have correct permissions
- Memory loaded at startup
Fixes:
Verify memory setup:
ls -la ~/.openclaw/workspace/
# Should show:
# MEMORY.md
# memory/
Add memory to instructions:
Before answering anything about prior work, decisions, dates, people, or todos:
run memory_search on MEMORY.md + memory/*.md
6. Permission Errors
Symptoms:
- "Permission denied"
- "Access denied"
- Tools not working
Diagnosis:
Check:
- User permissions
- File permissions
- Tool policies
- Sandbox restrictions
Fixes:
Check file permissions:
ls -la /path/to/file
chmod 600 ~/.openclaw/workspace/sensitive.json
Review tool policies:
{
"tools": {
"exec": {
"security": "ask", // or "allowlist" or "full"
"ask": "on-miss" // or "always" or "off"
}
}
}
7. Performance Issues
Symptoms:
- Slow responses
- Timeouts
- High resource usage
Diagnosis:
Profile the agent:
- Time each tool call
- Count tokens used
- Measure context growth
- Identify bottlenecks
Fixes:
Optimize context:
# Instead of loading entire file
content = read("large_file.txt", limit=50)
# Use targeted search
results = memory_search("specific topic")
Reduce tool calls:
# Bad: Multiple calls
file1 = read("file1.txt")
file2 = read("file2.txt")
file3 = read("file3.txt")
# Good: Parallel or combined
files = read(["file1.txt", "file2.txt", "file3.txt"])
Debugging Workflow
Step 1: Reproduce
1. Document exact steps to trigger issue
2. Note expected vs actual behavior
3. Check if issue is consistent or intermittent
4. Try with minimal example
Step 2: Isolate
1. Disable other skills
2. Reduce context to minimum
3. Simplify task
4. Test each component separately
Step 3: Diagnose
1. Check logs (if available)
2. Review tool outputs
3. Examine context window
4. Verify configuration
Step 4: Fix
1. Apply fix
2. Test fix
3. Document fix
4. Update instructions if needed
Step 5: Prevent
1. Add guardrails
2. Update error handling
3. Add logging
4. Document in memory
Debugging Tools
Check Agent Status
# If you have access to session tools
status = session_status()
print(f"Model: {status['model']}")
print(f"Tokens used: {status['usage']['total_tokens']}")
print(f"Reasoning: {status['reasoning']}")
Clear Context
If agent is stuck:
1. Start new session
2. Load only essential memory
3. Re-approach task fresh
Enable Verbose Mode
{
"thinking": "verbose",
"reasoning": "on"
}
This shows internal reasoning, helping identify where logic fails.
Common Error Messages
| Error | Cause | Fix |
|---|---|---|
context_length_exceeded | Too much context | Compress, summarize, limit |
rate_limit_exceeded | Too many requests | Backoff, queue, wait |
tool_not_found | Wrong tool name | Check spelling, install skill |
permission_denied | Insufficient access | Check permissions, ask user |
invalid_parameters | Wrong params | Validate against schema |
timeout | Slow response | Increase timeout, optimize |
memory_not_found | No memory files | Create MEMORY.md |
Best Practices
1. Defensive Coding
# Always check before acting
if not os.path.exists(file):
return "File not found"
try:
result = risky_operation()
except ExpectedError:
handle_error()
2. Progress Tracking
In agent instructions:
"Track your progress. After each major step, note what you've done and what's next."
3. Checkpointing
For long tasks:
- Save progress periodically
- Document current state
- Allow resuming from checkpoint
4. Logging
# Add to critical operations
log(f"Starting operation: {operation}")
log(f"Parameters: {params}")
log(f"Result: {result}")
log(f"Error: {error}")
When to Ask for Help
Ask the user when:
- Multiple fix attempts failed
- Issue is intermittent
- Would require destructive actions
- Need information only user has
- Configuration changes needed
Prevention Tips
- Set limits early - max iterations, max tokens, max retries
- Validate inputs - check parameters before calling tools
- Handle errors gracefully - don't crash, report and adapt
- Log important events - helps debugging later
- Test edge cases - empty inputs, large files, special characters
- Monitor resources - tokens, time, memory usage
- Document quirks - save lessons in MEMORY.md
Files
1 totalComments
Loading comments…
