Argocd Deployment Analyzer

v1.0.0

Analyze ArgoCD application sync status, detect configuration drift, review manifests for security and best practices, and diagnose sync failures.

0· 29· 1 versions· 0 current· 0 all-time· Updated 5h ago· MIT-0

by@charlie-morrison

ArgoCD Deployment Analyzer

Deep-dive analysis of ArgoCD-managed applications — detect sync drift, diagnose failed syncs, audit manifest security, review sync policies, and validate ArgoCD configurations against production best practices. Turns ArgoCD operational noise into actionable findings.

Use when: "analyze argocd apps", "why is my argocd app out of sync", "review argocd config", "audit gitops deployments", "diagnose sync failure", or when ArgoCD applications are degraded, drifting, or misconfigured.

Prerequisites

The agent checks for access to ArgoCD:

# CLI access
argocd version --client

# Logged in
argocd account get-user-info

# Or: kubectl access to ArgoCD namespace
kubectl get applications.argoproj.io -n argocd

# Or: ArgoCD API access
curl -s https://argocd.example.com/api/v1/applications \
  -H "Authorization: Bearer $ARGOCD_TOKEN" | jq '.items | length'

Usage

Provide one or more of:

Application name — specific ArgoCD app to analyze (e.g., production/api-server)
Project name — analyze all apps in an ArgoCD project
Scope — all to analyze every application
Focus area — sync, security, health, drift, config, or all

Example invocations:

Analyze why the payments-service ArgoCD app keeps going OutOfSync.

Security audit all ArgoCD applications in the production project.

Review our ArgoCD ApplicationSet configurations for best practices.

How It Works

Step 1: Application Inventory

Gather the full picture of all ArgoCD-managed applications:

# List all applications with status
argocd app list -o json | jq '[.[] | {
  name: .metadata.name,
  project: .spec.project,
  syncStatus: .status.sync.status,
  healthStatus: .status.health.status,
  repo: .spec.source.repoURL,
  path: .spec.source.path,
  targetRevision: .spec.source.targetRevision,
  destination: .spec.destination.server,
  namespace: .spec.destination.namespace,
  syncPolicy: .spec.syncPolicy
}]'

# Or via kubectl
kubectl get applications.argoproj.io -n argocd -o json | jq '[.items[] | {
  name: .metadata.name,
  sync: .status.sync.status,
  health: .status.health.status
}]'

Classify applications into categories:

Healthy + Synced — no action needed
Healthy + OutOfSync — drift detected, needs investigation
Degraded — health check failing
Progressing — sync in progress, check if stuck
Missing — target resources don't exist
Unknown — ArgoCD can't determine state

Step 2: Sync Drift Analysis

For each OutOfSync application, identify what drifted and why:

# Get the diff between live and desired state
argocd app diff <app-name> --local-repo-root /path/to/repo

# Detailed sync status with resource-level breakdown
argocd app get <app-name> -o json | jq '{
  syncStatus: .status.sync.status,
  revision: .status.sync.revision,
  comparedTo: .status.sync.comparedTo,
  resources: [.status.resources[] | select(.status != "Synced") | {
    kind: .kind,
    name: .name,
    namespace: .namespace,
    status: .status,
    health: .health.status,
    message: .health.message
  }]
}'

# Check sync history for patterns
argocd app get <app-name> -o json | jq '[.status.history[] | {
  revision: .revision[:8],
  deployedAt: .deployedAt,
  source: .source.path
}]'

Common drift causes the agent checks:

Manual kubectl edits — someone modified a resource directly, bypassing GitOps
Mutating webhooks — admission controllers injecting sidecars, labels, or annotations
Horizontal Pod Autoscaler — HPA changes replica count, conflicts with Git-declared replicas
Controller-managed fields — Kubernetes controllers (e.g., EndpointSlice controller) update fields
CRD defaults — CRD defaulting webhooks adding fields not in the Git source
Helm value drift — values.yaml in Git doesn't match what was rendered

Step 3: Sync Failure Diagnosis

When sync operations fail, diagnose the root cause:

# Get sync operation result
argocd app get <app-name> -o json | jq '.status.operationState | {
  phase: .phase,
  message: .message,
  startedAt: .startedAt,
  finishedAt: .finishedAt,
  syncResult: .syncResult.resources | map(select(.status != "Synced"))
}'

# Check for resource-level errors
argocd app resources <app-name> --orphaned

# Check events on the target namespace
kubectl get events -n <namespace> --sort-by='.lastTimestamp' | tail -20

Failure categories the agent identifies:

Category	Symptoms	Typical Fix
RBAC	`forbidden` errors in sync	Fix ArgoCD service account permissions
Schema validation	`validation failed`	Fix manifest against CRD/API schema
Namespace missing	`namespace not found`	Create namespace or enable auto-create
Resource conflict	`already exists`	Check for duplicate resource management
Quota exceeded	`exceeded quota`	Request quota increase or reduce resource requests
Immutable field	`field is immutable`	Delete and recreate the resource
Dependency order	`resource X not found`	Add sync waves or sync ordering
Timeout	`deadline exceeded`	Increase sync timeout or fix health check

Step 4: Health Check Analysis

Evaluate application health and identify degraded components:

# Health of each resource in the app
argocd app get <app-name> -o json | jq '[.status.resources[] | {
  kind: .kind,
  name: .name,
  health: .health.status,
  message: .health.message
}] | group_by(.health) | map({status: .[0].health, count: length, resources: map(.name)})'

# Pod-level issues for Degraded deployments
kubectl get pods -n <namespace> -l app=<app-label> -o json | jq '[.items[] | {
  name: .metadata.name,
  phase: .status.phase,
  ready: ([.status.conditions[] | select(.type=="Ready")] | .[0].status),
  restarts: ([.status.containerStatuses[].restartCount] | add),
  waiting: [.status.containerStatuses[] | select(.state.waiting) | .state.waiting.reason]
}]'

Step 5: Configuration Audit

Review ArgoCD Application and Project configurations for security and best practices:

Sync policy analysis:

# Check for dangerous sync policies
argocd app list -o json | jq '[.[] | select(
  .spec.syncPolicy.automated.prune == true and
  .spec.syncPolicy.automated.selfHeal == true
) | {name: .metadata.name, warning: "auto-prune + self-heal enabled"}]'

Checks performed:

Auto-sync without prune protection — accidental resource deletion risk
Self-heal on production — could mask legitimate manual hotfixes
Missing sync windows — production should have maintenance windows
No retry policy — transient failures won't self-recover
Wildcard project destinations — * server or namespace defeats RBAC
No resource whitelist/blacklist — project can deploy any resource type
Plaintext secrets in Git — secrets not managed by Sealed Secrets / SOPS / ESO
Missing ignoreDifferences — known benign drift causing noise
No notification triggers — sync failures go unnoticed
Orphaned resources — resources in the namespace not managed by any app

Step 6: Security Review

Audit manifests managed by ArgoCD applications for security issues:

# Extract rendered manifests
argocd app manifests <app-name> --source live > /tmp/live-manifests.yaml
argocd app manifests <app-name> --source git > /tmp/git-manifests.yaml

Security checks:

Containers running as root or with privileged: true
Missing SecurityContext, readOnlyRootFilesystem, runAsNonRoot
Missing resource limits (CPU/memory) — noisy neighbor risk
hostNetwork, hostPID, hostIPC enabled
ServiceAccount token auto-mounting when not needed
Missing NetworkPolicies
Images using :latest tag or no tag
Secrets mounted as environment variables instead of files
Missing PodDisruptionBudgets for critical services

Step 7: ApplicationSet Analysis

If ApplicationSets are used, validate their generators and templates:

kubectl get applicationsets -n argocd -o json | jq '[.items[] | {
  name: .metadata.name,
  generators: [.spec.generators[] | keys[0]],
  template: .spec.template.spec.source.repoURL,
  syncPolicy: .spec.template.spec.syncPolicy
}]'

Checks:

Git generator with overly broad directory patterns
Missing preserveResourcesOnDeletion (deleting the AppSet deletes all apps)
Cluster generator without label selectors (deploys to ALL clusters)
Template overrides that bypass project restrictions
No goTemplate validation (template injection risk)

Output

The agent produces a structured report:

Dashboard summary — total apps, sync status distribution, health distribution
Drift report — each OutOfSync app with specific resources and fields that drifted, with root cause
Failure diagnosis — for each failed sync: root cause, specific error, and remediation steps
Health issues — degraded resources with pod-level diagnostics
Configuration findings — ranked by severity (Critical / High / Medium / Low) with fix recommendations
Security findings — manifest-level security issues with remediation
Recommended ignoreDifferences — for known benign drift patterns (HPA replicas, annotation mutations, etc.)
Action items — prioritized list of changes to make, with example YAML patches

Common Remediation Patterns

HPA replica drift:

spec:
  ignoreDifferences:
    - group: apps
      kind: Deployment
      jsonPointers:
        - /spec/replicas

Mutating webhook annotations:

spec:
  ignoreDifferences:
    - group: ""
      kind: Service
      jqPathExpressions:
        - .metadata.annotations["webhook.example.com/injected"]

Sync wave ordering for dependencies: Use argocd.argoproj.io/sync-wave annotations: -1 for namespaces, 0 for ConfigMaps/Secrets, 1 for Deployments/Services.

Version tags

latestvk9746932j6dhnahnfz6q34g9kd85txqf