QA & Testing Engine

Provides a comprehensive testing methodology for AI software, covering strategy design, unit, integration, and end-to-end tests with coverage and reporting g...

MIT-0 · Free to use, modify, and redistribute. No attribution required.
1 · 958 · 3 current installs · 3 all-time installs
MIT-0
Security Scan
VirusTotalVirusTotal
Benign
View report →
OpenClawOpenClaw
Benign
medium confidence
Purpose & Capability
Name and description match the provided SKILL.md and README: the content is a comprehensive testing methodology (strategy, unit/integration/E2E, performance, security, accessibility, CI). There are no unexpected environment variables, binaries, or install steps requested that would be unrelated to testing.
Instruction Scope
Most of the SKILL.md content is focused on test design and test cases and is within scope. It explicitly includes security testing (OWASP, 'injection payloads', etc.) which is appropriate for a QA skill but means the agent could generate attack payloads or recommend running intrusive tests — ensure you have authorization and explicit test targets before executing such guidance. Some sections were truncated in the provided excerpt; if the omitted parts include commands to read local files, read env vars, or call external endpoints, that would warrant re-evaluation.
Install Mechanism
Instruction-only skill with no install spec and no code files. Nothing will be written to disk or installed by the platform when added, minimizing install-time risk.
Credentials
The skill declares no required environment variables, credentials, or config paths. That is proportionate for a documentation/instruction skill. Note: to run actual tests the agent guides you to create, you will likely need to supply your own credentials/endpoints — the skill itself does not request them.
Persistence & Privilege
Defaults are used (not always:true). The skill is user-invocable and can be invoked autonomously per platform defaults; that is normal. The skill does not request persistent system privileges or modify other skills' configurations.
Assessment
This skill appears to be a coherent, instruction-only QA/testing guide. Before installing or acting on its recommendations: - Do not run security/penetration tests or generated 'injection payloads' against systems you do not own or have explicit authorization to test. - Be aware the skill may instruct you to provide secrets (API keys, DB creds) when you actually execute tests; only supply those to trusted environments and never paste them into public chat. - The README links to external paid 'Context Packs' — those are outside the platform and may require payment or external accounts; verify the vendor independently. - Because portions of the SKILL.md were truncated in the provided excerpt, if you plan to let the agent run tests autonomously, review the entire SKILL.md for any instructions that perform network calls, read local files, or prompt for credentials. If such instructions exist, reconsider enabling autonomous execution and prefer manual, reviewed runs. If you want higher assurance, provide the full SKILL.md (no truncation) for a line-by-line check; that could raise confidence to high if no out-of-scope commands or data-exfiltration guidance are found.

Like a lobster shell, security has layers — review code before you run it.

Current versionv1.0.0
Download zip
automationvk97bsw3mpak6v7pb7g25kbnvz981b5r1latestvk97bsw3mpak6v7pb7g25kbnvz981b5r1performancevk97bsw3mpak6v7pb7g25kbnvz981b5r1qavk97bsw3mpak6v7pb7g25kbnvz981b5r1qualityvk97bsw3mpak6v7pb7g25kbnvz981b5r1securityvk97bsw3mpak6v7pb7g25kbnvz981b5r1testingvk97bsw3mpak6v7pb7g25kbnvz981b5r1

License

MIT-0
Free to use, modify, and redistribute. No attribution required.

SKILL.md

QA & Testing Engine — Complete Software Quality System

The definitive testing methodology for AI agents. From test strategy to execution, coverage to reporting — everything you need to ship quality software.

Phase 1: Test Strategy Design

Before writing a single test, design the strategy.

Strategy Brief Template

project:
  name: ""
  type: web-app | api | mobile | library | cli | data-pipeline
  languages: [typescript, python, go, java]
  frameworks: [react, express, django, spring]
  
risk_profile:
  data_sensitivity: low | medium | high | critical  # PII, financial, health
  user_impact: internal | b2b | b2c | life-safety
  deployment_frequency: daily | weekly | monthly
  regulatory: [none, SOC2, HIPAA, PCI-DSS, GDPR]

test_scope:
  in_scope: []    # Features, services, components
  out_of_scope: [] # Explicitly excluded (with reason)
  
environments:
  dev: { url: "", db: "local" }
  staging: { url: "", db: "seeded" }
  prod: { url: "", smoke_only: true }

Test Type Decision Matrix

Risk ProfileUnitIntegrationE2EPerformanceSecurityAccessibility
Internal tool✅ Core✅ API⚠️ Happy path⚠️ Basic
B2B SaaS✅ Full✅ Full✅ Critical flows✅ Load✅ OWASP Top 10✅ WCAG AA
B2C high-traffic✅ Full✅ Full✅ Full✅ Stress + soak✅ Full✅ WCAG AA
Financial/Health✅ Full + mutation✅ Full + contract✅ Full + chaos✅ Full suite✅ Pen test✅ WCAG AAA

Test Pyramid Architecture

         /  E2E  \          5-10% — Critical user journeys only
        / Integration \     20-30% — API contracts, service boundaries
       /    Unit Tests   \  60-70% — Business logic, pure functions

Anti-pattern: Ice cream cone — More E2E than unit tests. Slow, flaky, expensive. Fix by pushing test coverage DOWN the pyramid.

Anti-pattern: Hourglass — Lots of unit + E2E, no integration. Misses contract bugs between services.


Phase 2: Unit Testing Mastery

The AAA Pattern (Arrange-Act-Assert)

Every unit test follows this structure:

describe('PricingCalculator', () => {
  // Group by behavior, not by method
  describe('when customer has volume discount', () => {
    it('applies tiered pricing above threshold', () => {
      // ARRANGE — Set up the scenario
      const calculator = new PricingCalculator();
      const customer = createCustomer({ tier: 'enterprise', units: 150 });
      
      // ACT — Execute the behavior under test
      const price = calculator.calculate(customer);
      
      // ASSERT — Verify the outcome (ONE logical assertion)
      expect(price).toEqual({
        subtotal: 12000,
        discount: 1800,  // 15% volume discount
        total: 10200,
      });
    });
  });
});

Test Naming Convention

Format: [unit] [scenario] [expected behavior]

✅ Good:

  • PricingCalculator applies 15% discount when units exceed 100
  • UserService throws NotFoundError when user ID is invalid
  • parseDate returns null for malformed ISO strings

❌ Bad:

  • test1, should work, calculates price

What to Unit Test (Priority Order)

  1. Business logic — Pricing, rules, calculations, state machines
  2. Data transformations — Parsers, formatters, serializers, mappers
  3. Edge cases — Boundaries, null/undefined, empty collections, overflow
  4. Error handling — Every catch block, every validation path
  5. Pure functions — Easiest to test, highest ROI

What NOT to Unit Test

  • Framework internals (React rendering, Express routing)
  • Simple getters/setters with no logic
  • Third-party library behavior
  • Implementation details (private methods, internal state)

Mocking Rules

Dependency TypeStrategyExample
DatabaseMock the repository/DAOjest.mock('./userRepo')
HTTP APIMock the client or use MSWmsw.http.get('/api/users', ...)
File systemMock fs or use temp dirsjest.mock('fs/promises')
Time/DateFake timersjest.useFakeTimers()
RandomnessSeed or mockjest.spyOn(Math, 'random')
EnvironmentOverride env varsprocess.env.NODE_ENV = 'test'

Rule: Mock at boundaries, not internals. If you're mocking a class you own, your design might need refactoring.

Coverage Targets

MetricMinimumGoodExcellent
Line coverage70%85%95%+
Branch coverage60%80%90%+
Function coverage75%90%95%+
Critical path coverage100%100%100%

Warning: 100% coverage ≠ quality. Coverage measures what code ran, not what was verified. A test with no assertions has coverage but no value.


Phase 3: Integration Testing

API Testing Checklist

For every API endpoint, test:

endpoint: POST /api/orders
tests:
  happy_path:
    - Valid request returns 201 with order ID
    - Response matches schema
    - Database record created correctly
    - Events/webhooks fired
    
  validation:
    - Missing required fields → 400 with field errors
    - Invalid data types → 400 with type errors
    - Business rule violations → 422 with explanation
    
  authentication:
    - No token → 401
    - Expired token → 401
    - Wrong role → 403
    - Valid token → proceeds
    
  edge_cases:
    - Duplicate request (idempotency) → same response
    - Concurrent requests → no race condition
    - Maximum payload size → 413 or graceful handling
    - Special characters in input → no injection
    
  error_handling:
    - Database down → 503 with retry hint
    - External service timeout → 504 or fallback
    - Rate limit exceeded → 429 with retry-after

Contract Testing

When services communicate, test the contract:

contract:
  consumer: order-service
  provider: payment-service
  
  interactions:
    - description: "Process payment"
      request:
        method: POST
        path: /payments
        body:
          amount: 99.99
          currency: USD
          order_id: "ord_123"
      response:
        status: 200
        body:
          payment_id: "pay_xxx"  # string, not null
          status: "completed"    # enum: completed|pending|failed
          
  breaking_changes:  # NEVER do these without versioning
    - Remove a field from response
    - Change a field's type
    - Add a required field to request
    - Change the URL path
    - Change error response format

Database Testing Rules

  1. Each test gets a clean state — Use transactions that rollback, or truncate between tests
  2. Use factories, not fixturescreateUser({ role: 'admin' }) > hardcoded SQL dumps
  3. Test migrations — Run migrate-up, migrate-down, migrate-up (roundtrip)
  4. Test constraints — Unique violations, FK cascades, NOT NULL
  5. Test queries — Especially complex JOINs, aggregations, window functions

Phase 4: End-to-End Testing

Critical User Journey Mapping

Identify and test the flows that generate revenue or block users:

critical_journeys:
  - name: "Sign up → First value"
    steps:
      - Visit landing page
      - Click sign up
      - Fill registration form
      - Verify email
      - Complete onboarding
      - Perform first key action
    max_duration: 3 minutes
    
  - name: "Purchase flow"
    steps:
      - Browse products
      - Add to cart
      - Enter shipping
      - Enter payment
      - Confirm order
      - Receive confirmation email
    max_duration: 2 minutes
    
  - name: "Login → Core task → Logout"
    steps:
      - Login (password + SSO + MFA variants)
      - Navigate to core feature
      - Complete primary workflow
      - Verify result
      - Logout
    max_duration: 1 minute

E2E Best Practices

  1. Test user behavior, not implementation — Click buttons by text/role, not by CSS class
  2. Use data-testid sparingly — Only when no accessible selector exists
  3. Wait for state, not timewaitFor(element) not sleep(3000)
  4. Isolate test data — Each test creates its own users/data
  5. Run in CI with retries — 1 retry for flaky network, investigate if >5% flake rate

Selector Priority (Best → Worst)

  1. getByRole('button', { name: 'Submit' }) — Accessible, resilient
  2. getByLabelText('Email') — Form-specific, accessible
  3. getByText('Welcome back') — Content-based
  4. getByTestId('submit-btn') — Explicit test hook
  5. querySelector('.btn-primary') — ❌ Fragile, breaks on CSS changes

Flaky Test Triage

SymptomLikely CauseFix
Passes locally, fails in CITiming/race conditionAdd explicit waits, check CI resource limits
Fails intermittentlyShared state between testsIsolate test data, reset state
Fails after deployEnvironment differenceCheck env vars, API versions, feature flags
Fails at specific timeTime-dependent logicMock dates/times, avoid time-sensitive assertions
Fails in parallelResource contentionUse unique ports/DBs per worker

Rule: Quarantine flaky tests within 24 hours. A flaky test suite that everyone ignores is worse than no tests.


Phase 5: Performance Testing

Load Test Design

performance_tests:
  smoke:
    vus: 5
    duration: 1m
    purpose: "Verify test works"
    
  load:
    vus: 100  # Expected concurrent users
    duration: 10m
    ramp_up: 2m
    purpose: "Normal traffic behavior"
    thresholds:
      p95_response: <500ms
      error_rate: <1%
      
  stress:
    vus: 300  # 3x expected load
    duration: 15m
    ramp_up: 5m
    purpose: "Find breaking point"
    
  soak:
    vus: 80
    duration: 2h
    purpose: "Memory leaks, connection exhaustion"
    
  spike:
    stages:
      - { vus: 50, duration: 2m }
      - { vus: 500, duration: 30s }  # Sudden spike
      - { vus: 50, duration: 2m }
    purpose: "Recovery behavior"

Performance Budgets

MetricWeb AppAPIBackground Job
Response time (p50)<200ms<100msN/A
Response time (p95)<1s<500msN/A
Response time (p99)<3s<1sN/A
Throughput>100 rps>500 rps>1000/min
Error rate<0.1%<0.1%<0.5%
CPU usage<70%<70%<90%
Memory growth<5%/hr<2%/hr<10%/hr

Database Performance Testing

db_performance:
  query_tests:
    - name: "Dashboard aggregate query"
      baseline: 50ms
      max_acceptable: 200ms
      with_1M_rows: measure
      with_10M_rows: measure
      
  index_verification:
    - Run EXPLAIN ANALYZE on all critical queries
    - Verify no sequential scans on tables >10K rows
    - Check index usage statistics weekly
    
  connection_pool:
    - Test at max connections
    - Verify graceful handling when pool exhausted
    - Monitor connection wait time

Phase 6: Security Testing

OWASP Top 10 Test Checklist

security_tests:
  A01_broken_access_control:
    - [ ] Horizontal privilege escalation (access other user's data)
    - [ ] Vertical privilege escalation (access admin functions)
    - [ ] IDOR (Insecure Direct Object References)
    - [ ] Missing function-level access control
    - [ ] CORS misconfiguration
    
  A02_cryptographic_failures:
    - [ ] Sensitive data in transit (TLS 1.2+)
    - [ ] Sensitive data at rest (encryption)
    - [ ] Password hashing (bcrypt/argon2, not MD5/SHA)
    - [ ] No secrets in code/logs/URLs
    
  A03_injection:
    - [ ] SQL injection (parameterized queries)
    - [ ] NoSQL injection
    - [ ] Command injection (OS commands)
    - [ ] XSS (stored, reflected, DOM-based)
    - [ ] Template injection (SSTI)
    
  A04_insecure_design:
    - [ ] Rate limiting on auth endpoints
    - [ ] Account lockout after N failures
    - [ ] CAPTCHA on public forms
    - [ ] Business logic abuse scenarios
    
  A05_security_misconfiguration:
    - [ ] Default credentials removed
    - [ ] Error messages don't leak stack traces
    - [ ] Security headers set (CSP, HSTS, X-Frame-Options)
    - [ ] Directory listing disabled
    - [ ] Unnecessary HTTP methods disabled
    
  A07_auth_failures:
    - [ ] Brute force protection
    - [ ] Session fixation
    - [ ] Session timeout
    - [ ] JWT validation (signature, expiry, issuer)
    - [ ] MFA bypass attempts

Input Validation Test Payloads

Test every user input with:

injection_payloads:
  sql: ["' OR 1=1--", "'; DROP TABLE users;--", "1 UNION SELECT * FROM users"]
  xss: ["<script>alert(1)</script>", "<img onerror=alert(1) src=x>", "javascript:alert(1)"]
  path_traversal: ["../../etc/passwd", "..\\..\\windows\\system32", "%2e%2e%2f"]
  command: ["; ls -la", "| cat /etc/passwd", "$(whoami)", "`id`"]
  
boundary_values:
  strings: ["", " ", "a"*10000, null, undefined, "emoji: 🎯", "unicode: é à ü", "rtl: مرحبا"]
  numbers: [0, -1, 2147483647, -2147483648, NaN, Infinity, 0.1+0.2]
  arrays: [[], [null], Array(10000)]
  dates: ["1970-01-01", "2099-12-31", "invalid-date", "2024-02-29", "2023-02-29"]

Phase 7: Test Automation Architecture

Framework Selection Guide

NeedJavaScript/TSPythonGoJava
UnitVitest / Jestpytesttesting + testifyJUnit 5
APISupertesthttpx + pytestnet/http/httptestRestAssured
E2E (browser)PlaywrightPlaywrightchromedpSelenium
Performancek6LocustvegetaGatling
ContractPactPactPactPact
SecurityZAP + customBandit + customgosecSpotBugs

CI Pipeline Test Stages

pipeline:
  stage_1_fast:  # <2 min, blocks PR
    - Lint + type check
    - Unit tests
    - Security: dependency scan (npm audit / safety)
    
  stage_2_thorough:  # <10 min, blocks merge
    - Integration tests
    - Contract tests
    - Security: SAST scan
    - Coverage report + threshold check
    
  stage_3_confidence:  # <30 min, blocks deploy
    - E2E critical journeys
    - Visual regression (if applicable)
    - Security: container scan
    
  stage_4_post_deploy:  # After deploy to staging
    - Smoke tests against staging
    - Performance baseline check
    - Security: DAST scan (ZAP)
    
  stage_5_production:  # After prod deploy
    - Smoke tests (critical paths only)
    - Synthetic monitoring enabled
    - Canary metrics watching

Test Data Management

test_data_strategy:
  unit_tests:
    approach: factories  # Builder pattern, create exactly what you need
    example: "createUser({ role: 'admin', plan: 'enterprise' })"
    
  integration_tests:
    approach: seeded_database
    reset: per_test_suite  # Transaction rollback or truncate
    sensitive_data: anonymized  # Never use real PII
    
  e2e_tests:
    approach: api_setup  # Create data via API before test
    cleanup: after_each  # Delete created data
    isolation: unique_identifiers  # Timestamp or UUID in test data
    
  performance_tests:
    approach: representative_dataset
    volume: 10x_production  # Test with more data than prod
    generation: faker_libraries  # Realistic but synthetic

Phase 8: Quality Metrics & Reporting

Test Health Dashboard

metrics:
  test_suite_health:
    total_tests: 0
    passing: 0
    failing: 0
    skipped: 0  # >5% skipped = tech debt alarm
    flaky: 0    # >2% flaky = quarantine immediately
    
  coverage:
    line: "0%"
    branch: "0%"
    critical_paths: "0%"  # Must be 100%
    
  execution:
    unit_duration: "0s"    # Target: <30s
    integration_duration: "0s"  # Target: <5m
    e2e_duration: "0s"     # Target: <15m
    total_ci_time: "0s"    # Target: <20m
    
  defect_metrics:
    bugs_found_in_test: 0
    bugs_escaped_to_prod: 0
    escape_rate: "0%"      # Target: <5%
    mttr: "0h"             # Mean time to resolve
    
  trends:  # Track weekly
    new_tests_added: 0
    tests_deleted: 0  # Healthy deletion = removing redundant tests
    coverage_delta: "+0%"
    flake_rate_delta: "+0%"

Test Report Template

# Test Report — [Feature/Sprint/Release]

## Summary
- **Status:** ✅ PASS / ⚠️ PASS WITH RISKS / ❌ FAIL
- **Tests Run:** X | **Passed:** X | **Failed:** X | **Skipped:** X
- **Coverage:** Line X% | Branch X% | Critical 100%
- **Duration:** Xm Xs

## Key Findings

### 🔴 Critical (Block Release)
1. [Finding] — [Impact] — [Fix recommendation]

### 🟡 High (Fix Before Next Release)
1. [Finding] — [Impact] — [Fix recommendation]

### 🟢 Medium/Low (Backlog)
1. [Finding] — [Impact]

## Risk Assessment
- **Untested areas:** [list]
- **Known flaky tests:** [list with ticket IDs]
- **Performance concerns:** [if any]

## Recommendation
[Ship / Ship with monitoring / Hold for fixes]

Quality Score (0-100)

DimensionWeightScoring
Test coverage20%<60%=0, 60-70%=5, 70-80%=10, 80-90%=15, 90%+=20
Critical path coverage20%<100%=0, 100%=20
Defect escape rate15%>10%=0, 5-10%=5, 2-5%=10, <2%=15
Test suite speed10%>30m=0, 20-30m=3, 10-20m=7, <10m=10
Flake rate10%>5%=0, 2-5%=3, 1-2%=7, <1%=10
Security test coverage10%None=0, Basic=3, OWASP Top 10=7, Full=10
Documentation5%None=0, Basic=2, Complete=5
Automation ratio10%<50%=0, 50-70%=3, 70-90%=7, 90%+=10

Scoring: 0-40 = 🔴 Critical | 41-60 = 🟡 Needs Work | 61-80 = 🟢 Good | 81-100 = 💎 Excellent


Phase 9: Specialized Testing

Accessibility Testing (WCAG 2.1)

accessibility_checklist:
  level_a:  # Minimum compliance
    - [ ] All images have alt text
    - [ ] All form inputs have labels
    - [ ] Color is not the only visual indicator
    - [ ] Page has proper heading hierarchy (h1→h2→h3)
    - [ ] All functionality available via keyboard
    - [ ] Focus is visible and logical
    - [ ] No content flashes >3 times/second
    
  level_aa:  # Standard compliance (recommended)
    - [ ] Color contrast ratio ≥4.5:1 (normal text)
    - [ ] Color contrast ratio ≥3:1 (large text)
    - [ ] Text resizable to 200% without loss
    - [ ] Skip navigation links
    - [ ] Consistent navigation across pages
    - [ ] Error suggestions provided
    - [ ] ARIA landmarks for page regions
    
  tools:
    - axe-core (automated, catches ~30% of issues)
    - Lighthouse accessibility audit
    - Manual keyboard navigation test
    - Screen reader testing (VoiceOver/NVDA)

API Backward Compatibility Testing

compatibility_tests:
  when_updating_api:
    - [ ] All existing fields still present in response
    - [ ] No field type changes (string→number)
    - [ ] New required request fields have defaults
    - [ ] Deprecated fields still work (with warning header)
    - [ ] Error format unchanged
    - [ ] Pagination behavior unchanged
    - [ ] Rate limits not reduced
    
  versioning_strategy:
    - URL versioning: /v1/users, /v2/users
    - Header versioning: Accept: application/vnd.api+json;version=2
    - Sunset header for deprecated versions
    - Minimum 6-month deprecation notice

Chaos Engineering Principles

chaos_tests:
  network:
    - Service dependency goes down → graceful degradation?
    - Network latency increases 10x → timeout handling?
    - DNS resolution fails → fallback behavior?
    
  infrastructure:
    - Database primary fails → replica promotion?
    - Cache (Redis) goes down → DB fallback works?
    - Disk fills up → alerting + graceful failure?
    
  application:
    - Memory pressure → OOM handling?
    - CPU saturation → request queuing?
    - Certificate expiry → monitoring alert?
    
  data:
    - Corrupt message in queue → dead letter + alert?
    - Schema migration fails mid-way → rollback works?
    - Clock skew between services → idempotency holds?

Phase 10: Daily QA Workflow

For New Features

  1. Review requirements — Identify test scenarios before code is written (shift-left)
  2. Write test cases — Cover happy path, edge cases, error cases, security
  3. Review PR tests — Are tests meaningful? Do they test behavior, not implementation?
  4. Run full suite — Unit + integration + E2E for affected areas
  5. Report findings — Use the test report template above

For Bug Fixes

  1. Write failing test first — Reproduce the bug as a test
  2. Verify fix makes test pass — The test IS the proof
  3. Check for regression — Run related test suites
  4. Add to regression suite — Bug tests prevent re-introduction

Weekly QA Review

weekly_review:
  monday:
    - Review flaky test quarantine — fix or delete
    - Check coverage trends — declining = tech debt
    - Review escaped defects — update test strategy
    
  friday:
    - Update test health dashboard
    - Clean up obsolete tests
    - Document new testing patterns discovered
    - Plan next week's testing focus

Natural Language Commands

  • "Create test strategy for [project/feature]" → Full strategy brief
  • "Write unit tests for [function/class]" → AAA pattern tests with edge cases
  • "Test this API endpoint: [method] [path]" → Full API test checklist
  • "Review these tests for quality" → Test code review with scoring
  • "Generate performance test plan" → k6/Locust test design
  • "Security test [feature/endpoint]" → OWASP-based test checklist
  • "Create test report for [release]" → Formatted test report
  • "What's our test health?" → Dashboard with metrics and recommendations
  • "Find gaps in our test coverage" → Analysis with prioritized recommendations
  • "Help debug this flaky test" → Root cause analysis with fix suggestions
  • "Set up CI test pipeline" → Stage-by-stage pipeline config
  • "Accessibility audit [page/component]" → WCAG checklist with findings

Files

2 total
Select a file
Select a file to preview.

Comments

Loading comments…