Install
openclaw skills install feature-flag-cleanupAudit feature flag debt across LaunchDarkly, Unleash, Flagsmith, GrowthBook, Split, and home-grown flag systems. Detects stale flags (older than 90 days, fully rolled out, no toggle activity), classifies them by risk (kill-switch vs experiment vs permission vs ops toggle), tags owners from git blame and CODEOWNERS, generates removal pull requests ordered by safety, and produces a four-week cleanup playbook with rollback plans. Use when asked to find dead flags, reduce flag debt, plan a flag cleanup sprint, write a flag-removal PR, decommission a vendor flag service, or audit flag usage in a monorepo. Triggers on "feature flag", "feature toggle", "launchdarkly", "unleash", "flagsmith", "growthbook", "split.io", "stale flag", "dead flag", "flag debt", "flag cleanup", "flag audit", "kill switch", "rollout", "flag retirement".
openclaw skills install feature-flag-cleanupAudit and retire stale feature flags across a codebase and a flag service (LaunchDarkly, Unleash, Flagsmith, GrowthBook, Split, custom). Produces a ranked removal plan, owner-tagged tickets, removal PRs grouped by risk, and a four-week cleanup sprint. Acts as a platform engineer who has decommissioned thousands of flags without breaking production.
Invoke this skill when feature flag debt is slowing engineering down: builds carry stale toggles, code paths exist for experiments that ended a year ago, the LaunchDarkly bill is climbing, or a refactor is blocked by unread flags.
Basic invocation:
Audit our LaunchDarkly account and our monorepo for stale flags We have 1,200 flags in Unleash — find the dead ones Write a removal PR for the
new-checkout-v2flag
With context:
Here's our LD export and the code grep — order removals by risk We're moving off Split.io to GrowthBook — what flags die in the migration? Audit only the flags owned by my team (CODEOWNERS: payments)
The agent produces a stale-flag inventory, a four-week cleanup schedule, removal PRs in safety order, and a per-owner ticket list.
The agent first builds a complete inventory across two surfaces and joins them:
| Surface | What It Holds | How To Pull |
|---|---|---|
| Flag service | Targeting rules, rollout %, last-modified date, evaluation count | LaunchDarkly REST API /api/v2/flags, Unleash /api/admin/features, Flagsmith /api/v1/features/, GrowthBook /api/v1/features, Split /internal/api/v2/splits |
| Codebase | Flag references in source, configs, tests | git grep, AST parse, language-specific clients (useFeatureFlag, client.boolVariation, unleash.isEnabled) |
| Telemetry | Production evaluation counts per flag per day | Datadog, Honeycomb, vendor evaluation logs, OpenTelemetry traces |
| Ticket system | Originating ticket, intended sunset date | Jira flag: label, Linear cycle search |
The join key is the flag's key (string id). Any flag that exists in only one surface is suspect — code references with no service definition are dead code; service definitions with no code references are dead config.
Not all flags retire the same way. The agent assigns one of five types before deciding what to do:
| Type | Lifetime Expectation | Cleanup Default |
|---|---|---|
| Release toggle | Days to weeks during a rollout | Remove once 100% on for 30 days |
| Experiment | One experiment cycle (2-8 weeks) | Remove once analysis is shipped, winner picked |
| Permission / entitlement | Permanent | Migrate to authz system, then remove |
| Ops / kill switch | Permanent | Keep but document; review yearly |
| Config / parameter | Permanent | Migrate to config service if static, remove if dynamic decision is gone |
A flag's type is rarely tagged at creation — the agent infers from name patterns (enable_, kill_, experiment_, tier_), targeting rules (percentage rollout vs user-list vs segment), and evaluation patterns (steady-state vs spiking on deploy days).
The agent applies a layered staleness ruleset. A flag must trip at least two rules to be marked stale (single-rule failures produce a "watch list", not a removal).
Time rules:
R1. created_at older than 90 days
R2. last_modified older than 60 days (rules untouched)
R3. last_evaluation older than 30 days (no traffic)
R4. originating Jira ticket closed >180 days ago
State rules:
R5. served value is 100% (or 0%) for 30+ consecutive days
R6. all targeting rules collapse to a single variant (no branching)
R7. zero overrides, zero environment-specific differences
R8. variant served matches default for environment
Code rules:
R9. flag key not present in main branch (only in deleted branches / archived dirs)
R10. all code references are inside a single if branch with no else
R11. all references are tests (no production callsite)
R12. references exist only in disabled feature folders
Service rules:
R13. flag is archived in service but still referenced in code
R14. flag is in service but never linked to code (orphan)
R15. flag references a deleted segment / user list
R16. flag's prerequisites form a cycle or reference deleted flags
A removal candidate scores count(rules_tripped) plus a risk modifier (see Step 4). Anything ≥ 4 is a strong removal; 2-3 is staged with extra verification.
Each flag gets a removal-risk grade. Risk is independent of staleness — a stale flag can still be high-risk to remove if the wrong default ships.
| Grade | Criteria | Removal Approach |
|---|---|---|
| R0 — Trivial | Test-only references, dev-only flag, zero prod traffic in 90d | Single PR, batch with siblings |
| R1 — Low | Release toggle 100% on 60+ days, simple boolean, single owner | One PR per flag, standard review |
| R2 — Medium | Touches a paid feature, multiple owners, has variants beyond on/off, evaluated >1k/day | One PR per flag, two reviewers, deploy in own release window |
| R3 — High | Permission/entitlement, billing-adjacent, payment path, auth path | Migrate before remove. Owner sign-off required, integration tests, dark-launch the removal |
| R4 — Critical | Kill switch, regulatory, data residency, encryption toggle | Keep. Document and review yearly. Removal blocked unless replacement is in place. |
The agent never auto-generates a removal PR for R3 or R4 — those produce migration tickets instead.
Removals stall when nobody is on the hook. The agent assigns each flag exactly one owner using a fallback chain:
1. Service-side `tags` or `maintainer` field (LaunchDarkly tags, Unleash project)
2. Originating Jira/Linear ticket assignee
3. CODEOWNERS for the file containing the most references
4. `git log --follow` first author of the introducing commit
5. `git blame` on the line of the most recent flag check
6. Team channel mapping (#payments → @payments-leads)
The agent emits a per-owner queue so each engineer sees only their flags. Bulk emails to "engineering@" produce zero cleanup; per-owner tickets with a 2-week SLA produce 70%+ completion.
For each removable flag the agent emits a deterministic recipe by language and client. The pattern always:
if (flag) { A } else { B } with the winning branchExample — TypeScript / LaunchDarkly:
// before
import { useFlags } from 'launchdarkly-react-client-sdk';
const { newCheckout } = useFlags();
return newCheckout ? <CheckoutV2 /> : <CheckoutV1 />;
// after (newCheckout was 100% on for 60 days)
return <CheckoutV2 />;
// then delete CheckoutV1.tsx, its tests, its CSS, and remove from routing
Example — Go / Unleash:
// before
if unleash.IsEnabled("ff_async_export") {
go runAsyncExport(ctx, req)
} else {
runSyncExport(ctx, req)
}
// after
go runAsyncExport(ctx, req)
// then delete runSyncExport, delete its mock, prune the unleash strategy
Example — Python / Flagsmith:
# before
if flagsmith.has_feature("legacy_pricing", default=False):
price = legacy_price_engine(cart)
else:
price = price_engine_v2(cart)
# after
price = price_engine_v2(cart)
# then drop legacy_price_engine module and its 14 fixture files
Example — Custom DB-backed flag:
# before
if FeatureToggle.objects.get(key="multi_currency").enabled:
currency = detect_user_currency(user)
else:
currency = "USD"
# after
currency = detect_user_currency(user)
# then drop the FeatureToggle row, the migration to delete is M0142
The agent organizes PRs to balance reviewer load and blast radius:
chore(flags): remove ff_X (100% on since YYYY-MM-DD).Every PR includes a Rollback Plan section. The agent generates it from the flag definition:
## Rollback Plan
If incident: revert this commit. The previous behavior was the
`disabled` branch which called `legacy_pricing_engine`. That code
is preserved in commit abc1234 of branch `archive/ff-legacy-pricing`
for 90 days post-merge. After 90 days, recover from git history
via `git log --all --source -- legacy_pricing_engine.py`.
Even after removal, the agent leaves a 30-60-90 day safety net:
| Day | Action |
|---|---|
| 0 | Merge removal PR. Tag the commit flag-removed/ff_xxx. Open a 30-day calendar reminder for the owner. |
| 7 | Verify error rate, latency, conversion metric vs baseline. If regression, the agent generates a re-introduction PR that restores the flag and pins it to the previous default. |
| 30 | Delete the flag definition in the service (was archived at PR merge). Drop the archive branch if no rollback called. |
| 60 | Review the removed-flags log; mark "permanent" in MEMORY. |
| 90 | Drop the flag from runbooks, dashboards, alerts. |
For R3 / R4 retentions, extend the windows: 14 / 60 / 120 / 180.
Each provider has its own retirement workflow. The agent uses the right API, the right resource hierarchy, and the right billing impact.
LaunchDarkly:
DELETE /api/v2/flags/{projKey}/{flagKey} (archives by default; pass ?archived=true to unarchive)temporary, experiment, kill-switch make Step 2 automatic going forwardUnleash:
DELETE /api/admin/features/{name} (archives), then DELETE /api/admin/archive/{name} (hard delete)Flagsmith:
DELETE /api/v1/projects/{id}/features/{id}/GrowthBook:
DELETE /api/v1/features/{id} (requires admin role)Split.io:
DELETE /internal/api/v2/splits/ws/{wsId}/{splitName} (workspaces matter, easy to delete the wrong env)Custom / DB-backed flags:
SELECT key, MAX(updated_at)
FROM feature_toggles
WHERE key NOT IN ($referenced_keys_from_grep)
OR updated_at < NOW() - INTERVAL '180 days';
Cleanup as a one-off audit dies. Cleanup as a recurring sprint sticks. The agent emits a four-week schedule:
WEEK 1 — INVENTORY & BASELINE
Mon Pull flag list from service API → CSV
Tue git grep code references → join to CSV
Wed Pull last 30d evaluation telemetry → join to CSV
Thu Classify by type (R0-R4), tag owners, generate per-owner queue
Fri Publish dashboard: total flags, removable count, debt $$ estimate
WEEK 2 — TRIVIAL & LOW (R0/R1)
Mon Bulk-PR all R0 flags (test-only, dev-only)
Tue Open R1 PRs in batches of 10 per team
Wed Merge R0 batch (single reviewer), monitor CI
Thu Merge R1 batch on standard review cadence
Fri Service-side: archive the merged flags, refresh dashboard
WEEK 3 — MEDIUM (R2)
Mon Per-flag PRs for R2; pair with feature owner
Tue Schedule R2 deploys to low-traffic windows
Wed Deploy first half, watch error budget for 24h
Thu Deploy second half if green; revert plan exercised on any regression
Fri Service-side cleanup; mark watch-period start date
WEEK 4 — HIGH (R3) MIGRATION + REPORT
Mon R3 migration PRs land (NOT removals — replacements)
Tue Replacement code burns in for the 7-day rule
Wed Generate the 30/60/90 calendar reminders
Thu Update runbooks, ADRs, onboarding docs to match new state
Fri Post-mortem write-up: flags removed, $$ saved, debt remaining,
and a date for the next quarter's audit
Cleanup without prevention runs the same audit again next quarter. The agent emits a prevention pack:
expires_at NOT NULL.expires_at, or when the introducing PR has no linked removal ticket.Inventory results:
Total flags 1,212
- Active 743
- Archived (still in code) 204
- Code-only orphans 265
By type:
- Release toggle 612 (50%)
- Experiment 188 (15%)
- Permission/entitlement 121 (10%)
- Ops/kill 94 (8%)
- Unclassified 197 (16%)
Staleness (≥2 rules tripped): 687 candidates (57%)
By risk:
- R0 trivial 142
- R1 low 312
- R2 medium 178
- R3 high 49
- R4 critical 6
Plan:
Estimated savings: 38% reduction in flag count, ~$22k/yr LD cost reduction (based on MAU savings from the 89 high-traffic experiments retired), and ~3,400 LOC removed.
new_checkout_v2Audit:
Key: new_checkout_v2
Service: LaunchDarkly (project: web-app)
Created: 2025-01-12 (110 days ago)
Last modified: 2025-02-14 (rules frozen)
Rollout: 100% production for 84 days
Variants: on / off
Code refs: 3 files, 7 callsites
- src/checkout/route.tsx (4)
- src/checkout/__tests__/route.test.tsx (2)
- src/analytics/checkout.ts (1)
Telemetry: 184k evals/day, all → "on"
Owner (CODEOWNERS): @payments-team
Type: Release toggle
Risk: R1 (low)
Generated PR:
Title: chore(flags): remove new_checkout_v2 (100% on since 2025-02-14)
Summary
Flag has served `on` to 100% of production for 84 days with zero
toggle activity. Removing both the flag check and the dead v1
checkout component.
Changes
- src/checkout/route.tsx : -28 +4
- src/checkout/CheckoutV1.tsx : deleted (-512)
- src/checkout/CheckoutV1.css : deleted (-180)
- src/checkout/__tests__/... : -1 file, -94 lines
- src/analytics/checkout.ts : -6 +1
Rollback Plan
Revert this commit. v1 component preserved on branch
archive/ff-new-checkout-v2 for 90 days. Re-enable flag in LD
via the saved config in the linked ticket.
Service-side follow-up
- Archive flag in LD on merge (Workflow scheduled)
- Hard-delete flag 2026-08-01 (90 days post-merge)
Tickets
PROJ-4421 (close on merge)
A team running a home-grown feature_toggles table:
-- Step 1: code-referenced keys (output of grep)
WITH code_refs(key) AS (VALUES
('multi_currency'), ('beta_dashboard'), ('legacy_pricing'),
('async_export'), ('new_search_v2')
)
SELECT ft.key,
ft.enabled,
ft.updated_at,
(SELECT 1 FROM code_refs c WHERE c.key = ft.key) AS in_code
FROM feature_toggles ft
ORDER BY ft.updated_at;
The agent emits the migration plus the code PR in one bundle:
PR-1 (DB migration M0142_drop_dead_toggles.sql)
DELETE FROM feature_toggles
WHERE key IN ('legacy_pricing', 'beta_dashboard', 'old_v1');
PR-2 (code)
- drop legacy_price_engine.py
- drop beta_dashboard route
- prune calls in 4 files
PR-2 merges first; PR-1 deploys in the next migration window.
The agent produces:
The agent maps each flag to one of: keep-and-migrate, remove-with-default, remove-as-dead-code. Migration target is your incumbent flag service. Output is two PR streams: code PRs (own repo) and import scripts (for the incumbent service).
The agent generates the immediate-revert PR and a postmortem template. Then it adds the missing detection: which staleness rule should have caught this, and patches the ruleset (e.g. add "no diurnal pattern in evaluations" as R17).
The agent estimates: vendor cost (MAU × $rate × removable_traffic_share), engineering time tax (avg 6 min per stale flag in PR review × refs/quarter), and incident risk (count of incidents in the last 12 months that mention a flag). One-page CFO-ready memo.
No. The agent ships the team's queue independently. Cross-team coordination kills cleanups. Each team should own its flag retirement budget.