{"skill":{"slug":"perf-profiler","displayName":"Performance Profiler","summary":"Profile and optimize application performance. Use when diagnosing slow code, measuring CPU/memory usage, generating flame graphs, benchmarking functions, load testing APIs, finding memory leaks, or optimizing database queries.","description":"---\nname: perf-profiler\ndescription: Profile and optimize application performance. Use when diagnosing slow code, measuring CPU/memory usage, generating flame graphs, benchmarking functions, load testing APIs, finding memory leaks, or optimizing database queries.\nmetadata: {\"clawdbot\":{\"emoji\":\"⚡\",\"requires\":{\"anyBins\":[\"node\",\"python3\",\"go\",\"curl\",\"ab\"]},\"os\":[\"linux\",\"darwin\",\"win32\"]}}\n---\n\n# Performance Profiler\n\nMeasure, profile, and optimize application performance. Covers CPU profiling, memory analysis, flame graphs, benchmarking, load testing, and language-specific optimization patterns.\n\n## When to Use\n\n- Diagnosing why an application or function is slow\n- Measuring CPU and memory usage\n- Generating flame graphs to visualize hot paths\n- Benchmarking functions or endpoints\n- Load testing APIs before deployment\n- Finding and fixing memory leaks\n- Optimizing database query performance\n- Comparing performance before and after changes\n\n## Quick Timing\n\n### Command-line timing\n\n```bash\n# Time any command\ntime my-command --flag\n\n# More precise: multiple runs with stats\nfor i in $(seq 1 10); do\n  /usr/bin/time -f \"%e\" my-command 2>&1\ndone | awk '{sum+=$1; sumsq+=$1*$1; count++} END {\n  avg=sum/count;\n  stddev=sqrt(sumsq/count - avg*avg);\n  printf \"runs=%d avg=%.3fs stddev=%.3fs\\n\", count, avg, stddev\n}'\n\n# Hyperfine (better benchmarking tool)\n# Install: https://github.com/sharkdp/hyperfine\nhyperfine 'command-a' 'command-b'\nhyperfine --warmup 3 --runs 20 'my-command'\nhyperfine --export-json results.json 'old-version' 'new-version'\n```\n\n### Inline timing (any language)\n\n```javascript\n// Node.js\nconsole.time('operation');\nawait doExpensiveThing();\nconsole.timeEnd('operation'); // \"operation: 142.3ms\"\n\n// High-resolution\nconst start = performance.now();\nawait doExpensiveThing();\nconst elapsed = performance.now() - start;\nconsole.log(`Elapsed: ${elapsed.toFixed(2)}ms`);\n```\n\n```python\n# Python\nimport time\n\nstart = time.perf_counter()\ndo_expensive_thing()\nelapsed = time.perf_counter() - start\nprint(f\"Elapsed: {elapsed:.4f}s\")\n\n# Context manager\nfrom contextlib import contextmanager\n\n@contextmanager\ndef timer(label=\"\"):\n    start = time.perf_counter()\n    yield\n    elapsed = time.perf_counter() - start\n    print(f\"{label}: {elapsed:.4f}s\")\n\nwith timer(\"data processing\"):\n    process_data()\n```\n\n```go\n// Go\nstart := time.Now()\ndoExpensiveThing()\nfmt.Printf(\"Elapsed: %v\\n\", time.Since(start))\n```\n\n## Node.js Profiling\n\n### CPU profiling with V8 inspector\n\n```bash\n# Generate CPU profile (writes .cpuprofile file)\nnode --cpu-prof app.js\n# Open the .cpuprofile in Chrome DevTools > Performance tab\n\n# Profile for a specific duration\nnode --cpu-prof --cpu-prof-interval=100 app.js\n\n# Inspect running process\nnode --inspect app.js\n# Open chrome://inspect in Chrome, click \"inspect\"\n# Go to Performance tab, click Record\n```\n\n### Heap snapshots (memory)\n\n```bash\n# Generate heap snapshot\nnode --heap-prof app.js\n\n# Take snapshots programmatically\nnode -e \"\nconst v8 = require('v8');\nconst fs = require('fs');\n\n// Take snapshot\nconst snapshotStream = v8.writeHeapSnapshot();\nconsole.log('Heap snapshot written to:', snapshotStream);\n\"\n\n# Compare heap snapshots to find leaks:\n# 1. Take snapshot A (baseline)\n# 2. Run operations that might leak\n# 3. Take snapshot B\n# 4. In Chrome DevTools > Memory, load both and use \"Comparison\" view\n```\n\n### Memory usage monitoring\n\n```javascript\n// Print memory usage periodically\nsetInterval(() => {\n  const usage = process.memoryUsage();\n  console.log({\n    rss: `${(usage.rss / 1024 / 1024).toFixed(1)}MB`,\n    heapUsed: `${(usage.heapUsed / 1024 / 1024).toFixed(1)}MB`,\n    heapTotal: `${(usage.heapTotal / 1024 / 1024).toFixed(1)}MB`,\n    external: `${(usage.external / 1024 / 1024).toFixed(1)}MB`,\n  });\n}, 5000);\n\n// Detect memory growth\nlet lastHeap = 0;\nsetInterval(() => {\n  const heap = process.memoryUsage().heapUsed;\n  const delta = heap - lastHeap;\n  if (delta > 1024 * 1024) { // > 1MB growth\n    console.warn(`Heap grew by ${(delta / 1024 / 1024).toFixed(1)}MB`);\n  }\n  lastHeap = heap;\n}, 10000);\n```\n\n### Node.js benchmarking\n\n```javascript\n// Simple benchmark function\nfunction benchmark(name, fn, iterations = 10000) {\n  // Warmup\n  for (let i = 0; i < 100; i++) fn();\n\n  const start = performance.now();\n  for (let i = 0; i < iterations; i++) fn();\n  const elapsed = performance.now() - start;\n\n  console.log(`${name}: ${(elapsed / iterations).toFixed(4)}ms/op (${iterations} iterations in ${elapsed.toFixed(1)}ms)`);\n}\n\nbenchmark('JSON.parse', () => JSON.parse('{\"key\":\"value\",\"num\":42}'));\nbenchmark('regex match', () => /^\\d{4}-\\d{2}-\\d{2}$/.test('2026-02-03'));\n```\n\n## Python Profiling\n\n### cProfile (built-in CPU profiler)\n\n```bash\n# Profile a script\npython3 -m cProfile -s cumulative my_script.py\n\n# Save to file for analysis\npython3 -m cProfile -o profile.prof my_script.py\n\n# Analyze saved profile\npython3 -c \"\nimport pstats\nstats = pstats.Stats('profile.prof')\nstats.sort_stats('cumulative')\nstats.print_stats(20)\n\"\n\n# Profile a specific function\npython3 -c \"\nimport cProfile\nfrom my_module import expensive_function\n\ncProfile.run('expensive_function()', sort='cumulative')\n\"\n```\n\n### line_profiler (line-by-line)\n\n```bash\n# Install\npip install line_profiler\n\n# Add @profile decorator to functions of interest, then:\nkernprof -l -v my_script.py\n```\n\n```python\n# Programmatic usage\nfrom line_profiler import LineProfiler\n\ndef process_data(data):\n    result = []\n    for item in data:           # Is this loop the bottleneck?\n        transformed = transform(item)\n        if validate(transformed):\n            result.append(transformed)\n    return result\n\nprofiler = LineProfiler()\nprofiler.add_function(process_data)\nprofiler.enable()\nprocess_data(large_dataset)\nprofiler.disable()\nprofiler.print_stats()\n```\n\n### Memory profiling (Python)\n\n```bash\n# memory_profiler\npip install memory_profiler\n\n# Profile memory line-by-line\npython3 -m memory_profiler my_script.py\n```\n\n```python\nfrom memory_profiler import profile\n\n@profile\ndef load_data():\n    data = []\n    for i in range(1000000):\n        data.append({'id': i, 'value': f'item_{i}'})\n    return data\n\n# Track memory over time\nimport tracemalloc\n\ntracemalloc.start()\n\n# ... run code ...\n\nsnapshot = tracemalloc.take_snapshot()\ntop_stats = snapshot.statistics('lineno')\nfor stat in top_stats[:10]:\n    print(stat)\n```\n\n### Python benchmarking\n\n```python\nimport timeit\n\n# Time a statement\nresult = timeit.timeit('sorted(range(1000))', number=10000)\nprint(f\"sorted: {result:.4f}s for 10000 iterations\")\n\n# Compare two approaches\nsetup = \"data = list(range(10000))\"\nt1 = timeit.timeit('list(filter(lambda x: x % 2 == 0, data))', setup=setup, number=1000)\nt2 = timeit.timeit('[x for x in data if x % 2 == 0]', setup=setup, number=1000)\nprint(f\"filter: {t1:.4f}s  |  listcomp: {t2:.4f}s  |  speedup: {t1/t2:.2f}x\")\n\n# pytest-benchmark\n# pip install pytest-benchmark\n# def test_sort(benchmark):\n#     benchmark(sorted, list(range(1000)))\n```\n\n## Go Profiling\n\n### Built-in pprof\n\n```go\n// Add to main.go for HTTP-accessible profiling\nimport (\n    \"net/http\"\n    _ \"net/http/pprof\"\n)\n\nfunc main() {\n    go func() {\n        http.ListenAndServe(\"localhost:6060\", nil)\n    }()\n    // ... rest of app\n}\n```\n\n```bash\n# CPU profile (30 seconds)\ngo tool pprof http://localhost:6060/debug/pprof/profile?seconds=30\n\n# Memory profile\ngo tool pprof http://localhost:6060/debug/pprof/heap\n\n# Goroutine profile\ngo tool pprof http://localhost:6060/debug/pprof/goroutine\n\n# Inside pprof interactive mode:\n# top 20          - top functions by CPU/memory\n# list funcName   - source code with annotations\n# web             - open flame graph in browser\n# png > out.png   - save call graph as image\n```\n\n### Go benchmarks\n\n```go\n// math_test.go\nfunc BenchmarkAdd(b *testing.B) {\n    for i := 0; i < b.N; i++ {\n        Add(42, 58)\n    }\n}\n\nfunc BenchmarkSort1000(b *testing.B) {\n    data := make([]int, 1000)\n    for i := range data {\n        data[i] = rand.Intn(1000)\n    }\n    b.ResetTimer()\n    for i := 0; i < b.N; i++ {\n        sort.Ints(append([]int{}, data...))\n    }\n}\n```\n\n```bash\n# Run benchmarks\ngo test -bench=. -benchmem ./...\n\n# Compare before/after\ngo test -bench=. -count=5 ./... > old.txt\n# ... make changes ...\ngo test -bench=. -count=5 ./... > new.txt\ngo install golang.org/x/perf/cmd/benchstat@latest\nbenchstat old.txt new.txt\n```\n\n## Flame Graphs\n\n### Generate flame graphs\n\n```bash\n# Node.js: 0x (easiest)\nnpx 0x app.js\n# Opens interactive flame graph in browser\n\n# Node.js: clinic.js (comprehensive)\nnpx clinic flame -- node app.js\nnpx clinic doctor -- node app.js\nnpx clinic bubbleprof -- node app.js\n\n# Python: py-spy (sampling profiler, no code changes needed)\npip install py-spy\npy-spy record -o flame.svg -- python3 my_script.py\n\n# Profile running Python process\npy-spy record -o flame.svg --pid 12345\n\n# Go: built-in\ngo tool pprof -http=:8080 http://localhost:6060/debug/pprof/profile?seconds=30\n# Navigate to \"Flame Graph\" view\n\n# Linux (any process): perf + flamegraph\nperf record -g -p PID -- sleep 30\nperf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg\n```\n\n### Reading flame graphs\n\n```\nKey concepts:\n- X-axis: NOT time. It's alphabetical sort of stack frames. Width = % of samples.\n- Y-axis: Stack depth. Top = leaf function (where CPU time is spent).\n- Wide bars at the top = hot functions (optimize these first).\n- Narrow tall stacks = deep call chains (may indicate excessive abstraction).\n\nWhat to look for:\n1. Wide plateaus at the top → function that dominates CPU time\n2. Multiple paths converging to one function → shared bottleneck\n3. GC/runtime frames taking significant width → memory pressure\n4. Unexpected functions appearing wide → performance bug\n```\n\n## Load Testing\n\n### curl-based quick test\n\n```bash\n# Single request timing\ncurl -o /dev/null -s -w \"HTTP %{http_code} | Total: %{time_total}s | TTFB: %{time_starttransfer}s | Connect: %{time_connect}s\\n\" https://api.example.com/endpoint\n\n# Multiple requests in sequence\nfor i in $(seq 1 20); do\n  curl -o /dev/null -s -w \"%{time_total}\\n\" https://api.example.com/endpoint\ndone | awk '{sum+=$1; count++; if($1>max)max=$1} END {printf \"avg=%.3fs max=%.3fs n=%d\\n\", sum/count, max, count}'\n```\n\n### Apache Bench (ab)\n\n```bash\n# 100 requests, 10 concurrent\nab -n 100 -c 10 http://localhost:3000/api/endpoint\n\n# With POST data\nab -n 100 -c 10 -p data.json -T application/json http://localhost:3000/api/endpoint\n\n# Key metrics to watch:\n# - Requests per second (throughput)\n# - Time per request (latency)\n# - Percentage of requests served within a certain time (p50, p90, p99)\n```\n\n### wrk (modern load testing)\n\n```bash\n# Install: https://github.com/wg/wrk\n# 10 seconds, 4 threads, 100 connections\nwrk -t4 -c100 -d10s http://localhost:3000/api/endpoint\n\n# With Lua script for custom requests\nwrk -t4 -c100 -d10s -s post.lua http://localhost:3000/api/endpoint\n```\n\n```lua\n-- post.lua\nwrk.method = \"POST\"\nwrk.body   = '{\"key\": \"value\"}'\nwrk.headers[\"Content-Type\"] = \"application/json\"\n\n-- Custom request generation\nrequest = function()\n  local id = math.random(1, 10000)\n  local path = \"/api/users/\" .. id\n  return wrk.format(\"GET\", path)\nend\n```\n\n### Autocannon (Node.js load testing)\n\n```bash\nnpx autocannon -c 100 -d 10 http://localhost:3000/api/endpoint\nnpx autocannon -c 100 -d 10 -m POST -b '{\"key\":\"value\"}' -H 'Content-Type=application/json' http://localhost:3000/api/endpoint\n```\n\n## Database Query Performance\n\n### EXPLAIN analysis\n\n```bash\n# PostgreSQL\npsql -c \"EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT * FROM orders WHERE user_id = 123;\"\n\n# MySQL\nmysql -e \"EXPLAIN SELECT * FROM orders WHERE user_id = 123;\" mydb\n\n# SQLite\nsqlite3 mydb.sqlite \"EXPLAIN QUERY PLAN SELECT * FROM orders WHERE user_id = 123;\"\n```\n\n### Slow query detection\n\n```bash\n# PostgreSQL: enable slow query logging\n# In postgresql.conf:\n# log_min_duration_statement = 100  (ms)\n\n# MySQL: slow query log\n# In my.cnf:\n# slow_query_log = 1\n# long_query_time = 0.1\n\n# Find queries missing indexes (PostgreSQL)\npsql -c \"\nSELECT schemaname, relname, seq_scan, seq_tup_read,\n       idx_scan, idx_tup_fetch,\n       seq_tup_read / GREATEST(seq_scan, 1) AS avg_rows_per_scan\nFROM pg_stat_user_tables\nWHERE seq_scan > 100 AND seq_tup_read / GREATEST(seq_scan, 1) > 1000\nORDER BY seq_tup_read DESC\nLIMIT 10;\n\"\n```\n\n## Memory Leak Detection Patterns\n\n### Node.js\n\n```javascript\n// Track object counts over time\nconst v8 = require('v8');\n\nfunction checkMemory() {\n  const heap = v8.getHeapStatistics();\n  const usage = process.memoryUsage();\n  return {\n    heapUsedMB: (usage.heapUsed / 1024 / 1024).toFixed(1),\n    heapTotalMB: (usage.heapTotal / 1024 / 1024).toFixed(1),\n    rssMB: (usage.rss / 1024 / 1024).toFixed(1),\n    externalMB: (usage.external / 1024 / 1024).toFixed(1),\n    arrayBuffersMB: (usage.arrayBuffers / 1024 / 1024).toFixed(1),\n  };\n}\n\n// Sample every 10s, alert on growth\nlet baseline = process.memoryUsage().heapUsed;\nsetInterval(() => {\n  const current = process.memoryUsage().heapUsed;\n  const growthMB = (current - baseline) / 1024 / 1024;\n  if (growthMB > 50) {\n    console.warn(`Memory grew ${growthMB.toFixed(1)}MB since start`);\n    console.warn(checkMemory());\n  }\n}, 10000);\n```\n\n### Common leak patterns\n\n```\nNode.js:\n- Event listeners not removed (emitter.on without emitter.off)\n- Closures capturing large objects in long-lived scopes\n- Global caches without eviction (Map/Set that only grows)\n- Unresolved promises accumulating\n\nPython:\n- Circular references (use weakref for caches)\n- Global lists/dicts that grow unbounded\n- File handles not closed (use context managers)\n- C extension objects not properly freed\n\nGo:\n- Goroutine leaks (goroutine started, never returns)\n- Forgotten channel listeners\n- Unclosed HTTP response bodies\n- Global maps that grow forever\n```\n\n## Performance Comparison Script\n\n```bash\n#!/bin/bash\n# perf-compare.sh - Compare performance before/after a change\n# Usage: perf-compare.sh <command> [runs]\nCMD=\"${1:?Usage: perf-compare.sh <command> [runs]}\"\nRUNS=\"${2:-10}\"\n\necho \"Benchmarking: $CMD\"\necho \"Runs: $RUNS\"\necho \"\"\n\ntimes=()\nfor i in $(seq 1 \"$RUNS\"); do\n  start=$(date +%s%N)\n  eval \"$CMD\" > /dev/null 2>&1\n  end=$(date +%s%N)\n  elapsed=$(echo \"scale=3; ($end - $start) / 1000000\" | bc)\n  times+=(\"$elapsed\")\n  printf \"  Run %2d: %sms\\n\" \"$i\" \"$elapsed\"\ndone\n\necho \"\"\nprintf '%s\\n' \"${times[@]}\" | awk '{\n  sum += $1\n  sumsq += $1 * $1\n  if (NR == 1 || $1 < min) min = $1\n  if (NR == 1 || $1 > max) max = $1\n  count++\n} END {\n  avg = sum / count\n  stddev = sqrt(sumsq/count - avg*avg)\n  printf \"Results: avg=%.1fms min=%.1fms max=%.1fms stddev=%.1fms (n=%d)\\n\", avg, min, max, stddev, count\n}'\n```\n\n## Tips\n\n- **Profile before optimizing.** Guessing where bottlenecks are is wrong more often than right. Measure first.\n- **Optimize the hot path.** Flame graphs show you exactly which functions consume the most time. A 10% improvement in a function that takes 80% of CPU time is worth more than a 50% improvement in one that takes 2%.\n- **Memory and CPU are different problems.** A memory leak can exist in fast code. A CPU bottleneck can exist in code with stable memory. Profile both independently.\n- **Benchmark under realistic conditions.** Microbenchmarks (empty loops, single-function timing) can be misleading due to JIT optimization, caching, and branch prediction. Use realistic data and workloads.\n- **p99 matters more than average.** An API with 50ms average but 2s p99 has a tail latency problem. Always look at percentiles, not just averages.\n- **Load test before shipping.** `ab`, `wrk`, or `autocannon` for 60 seconds at expected peak traffic reveals problems that unit tests never will.\n- **GC pauses are real.** In Node.js, Python, Go, and Java, garbage collection can cause latency spikes. If flame graphs show significant GC time, reduce allocation pressure (reuse objects, use object pools, avoid unnecessary copies).\n- **Database queries are usually the bottleneck.** Before optimizing application code, run `EXPLAIN` on your slowest queries. An index can turn a 2-second query into 2ms.\n","tags":{"latest":"1.0.0"},"stats":{"comments":0,"downloads":3609,"installsAllTime":23,"installsCurrent":23,"stars":2,"versions":1},"createdAt":1770155115487,"updatedAt":1778988120541},"latestVersion":{"version":"1.0.0","createdAt":1770155115487,"changelog":"Initial release: CPU/memory profiling for Node.js/Python/Go, flame graphs, benchmarking, load testing, memory leak detection, database query optimization","license":null},"metadata":{"setup":[],"os":["linux","darwin","win32"],"systems":null},"owner":{"handle":"gitgoodordietrying","userId":"s17bsk9s8a501ckx95hd6m2b75885xxv","displayName":"gitgoodordietrying","image":"https://avatars.githubusercontent.com/u/116975874?v=4"},"moderation":null}