Gitlab Ci Optimizer

v1.0.0

Optimize GitLab CI/CD pipelines — analyze .gitlab-ci.yml for speed, cost, caching, parallelism, DAG dependencies, and runner efficiency. Use when asked to sp...

0· 21· 1 versions· 0 current· 0 all-time· Updated 6h ago· MIT-0

Install

openclaw skills install gitlab-ci-optimizer

GitLab CI Optimizer

Analyze GitLab CI/CD pipeline configurations to find speed, cost, and reliability improvements. Examines .gitlab-ci.yml for caching gaps, missing parallelism, inefficient job ordering, bloated Docker images, redundant work, and misconfigured runners. Produces a concrete optimization plan with estimated time savings.

Use when: "speed up our CI", "pipeline takes too long", "optimize gitlab ci", "review our .gitlab-ci.yml", "reduce build costs", "fix flaky pipeline", or when a pipeline configuration needs improvement.

Analysis Steps

1. Parse Pipeline Structure

Read the .gitlab-ci.yml and any included files to build a complete picture:

# Find the main CI config
cat .gitlab-ci.yml

# Find all included CI files
grep -r "include:" .gitlab-ci.yml
find . -name "*.gitlab-ci.yml" -o -name ".gitlab-ci*.yml" | head -20
find . -path "*/.gitlab/ci/*.yml" | head -20

# Check for CI/CD variables defined in the file
grep -E "variables:" .gitlab-ci.yml

# List all jobs and their stages
grep -E "^[a-zA-Z_][a-zA-Z0-9_-]*:" .gitlab-ci.yml | grep -v "^#" | grep -v "stage:" | grep -v "variables:" | grep -v "include:" | grep -v "default:" | grep -v "workflow:"

For each job, extract: name, stage, runner tags, Docker image, needs/dependencies, cache/artifacts config, rules/conditions, estimated duration, and whether it runs on every commit or only specific branches.

2. Analyze Stage Dependencies

Map the execution flow to find bottlenecks:

Stage 1: build       [job-a: 3min] [job-b: 5min]
                      ↓              ↓
Stage 2: test        [job-c: 8min, needs: job-a] [job-d: 2min, needs: job-b]
                      ↓
Stage 3: deploy      [job-e: 1min, needs: job-c, job-d]

Critical path: job-b (5m) → job-d (2m) → job-e (1m) = 8 min
              OR: job-a (3m) → job-c (8m) → job-e (1m) = 12 min ← BOTTLENECK

Total wall time: 12 min (limited by the longest path)
Total compute time: 3 + 5 + 8 + 2 + 1 = 19 min (what you pay for)

Key questions:

  • Which job is the longest on the critical path? (optimize this first)
  • Are there jobs running sequentially that could run in parallel?
  • Are there jobs in the same stage that have no actual dependency on each other?

3. Evaluate Caching

Check for these common caching problems:

Missing cache entirely:

# BAD: Downloads all dependencies every run
install:
  script:
    - npm ci

# GOOD: Cache node_modules between runs
install:
  script:
    - npm ci
  cache:
    key:
      files:
        - package-lock.json
    paths:
      - node_modules/
    policy: pull-push

Cache key analysis:

Key StrategyWhen to UseInvalidation
$CI_COMMIT_REF_SLUGBranch-specific cachesNew branch = cold start
files: [package-lock.json]Dependency cachesOnly when lockfile changes
$CI_JOB_NAMEJob-specific cachesNever (manual clear)
prefix: $CI_COMMIT_REF_SLUG + files:Best of both worldsBranch + lockfile change

Cache policy optimization: Jobs that only READ the cache should use policy: pull (saves upload time). Only the install job should use policy: pull-push.

Cache vs Artifacts decision:

  • Cache = best-effort, speeds up repeated runs, may not be available
  • Artifacts = guaranteed, passes files between jobs in the same pipeline
  • Rule: Use artifacts for build outputs that downstream jobs NEED. Use cache for dependencies that are expensive to re-download.

4. Optimize Docker Images

Problem: Using large base images

# BAD: 1.2 GB image, takes 45 seconds to pull
build:
  image: node:18

# BETTER: 180 MB image, pulls in 5 seconds
build:
  image: node:18-alpine

# BEST: Pre-built image with your dependencies baked in
build:
  image: registry.gitlab.com/my-org/ci-images/node:18

Create custom CI images when before_script takes >30s or you install system packages every run. Use GitLab's dependency proxy (${CI_DEPENDENCY_PROXY_GROUP_IMAGE_PREFIX}/node:18-alpine) to cache Docker Hub images and avoid rate limits.

5. Implement DAG Dependencies

Replace stage-based ordering with needs: for parallel execution:

# Without DAG: test-frontend waits for ALL build jobs (including build-backend)
# With DAG: test-frontend starts immediately after build-frontend finishes

test-frontend:
  stage: test
  needs: [build-frontend]     # Start as soon as build-frontend finishes

test-backend:
  stage: test
  needs: [build-backend]      # Start as soon as build-backend finishes

deploy:
  stage: deploy
  needs: [test-frontend, test-backend]  # Start when BOTH tests pass

Impact: A pipeline with 5 jobs (3+5+8+2+1 min) drops from 19 min wall time to 12 min (37% faster).

6. Apply Parallelism

Test splitting with parallel:

test:
  stage: test
  parallel: 4
  script:
    - TOTAL=$CI_NODE_TOTAL
    - INDEX=$CI_NODE_INDEX
    # Split test files across parallel jobs
    - |
      TEST_FILES=$(find tests/ -name "*.test.js" | sort | awk "NR % $TOTAL == $INDEX")
      npx jest $TEST_FILES
  artifacts:
    reports:
      junit: junit.xml

Use parallel:matrix for multi-environment testing (e.g., NODE_VERSION: ["16", "18", "20"] x DB: ["postgres", "mysql"]).

7. Reduce Redundant Work

Three key techniques:

  1. Path-based filtering — only run jobs when relevant files change:
test-frontend:
  rules:
    - changes: ["frontend/**/*", "package-lock.json"]
      when: always
    - when: never
  1. Auto-cancel outdated pipelines:
workflow:
  auto_cancel:
    on_new_commit: interruptible
  1. Pass artifacts, don't rebuild — use artifacts: paths: [dist/] with expire_in: 1 hour and needs: in downstream jobs.

8. Optimize Runner Configuration

Match runner size to job requirements:

Job TypeRecommended SizeWhy
Build (compiled)large (4 CPU)Compilation is CPU-bound
Unit testsmedium (2 CPU)Moderate CPU, moderate RAM
Lint/formatsmall (1 CPU)Trivial compute
Integration testslarge (4 CPU)Runs services, needs RAM
Deploysmall (1 CPU)Just runs scripts/API calls

9. Common Anti-Patterns

Anti-PatternProblemFix
apt-get install in every job30-60s wasted per jobBake into custom Docker image
No cache: on dependency installDownloads 500MB+ every runAdd cache with lockfile key
All jobs in one stageMaximum sequential executionSplit into stages + use DAG
artifacts: paths: ["."]Uploads entire repo as artifactOnly artifact what downstream needs
when: manual without allow_failureBlocks entire pipelineAdd allow_failure: true for optional manual jobs
No expire_in on artifactsStorage grows foreverSet expire_in: 1 day for CI artifacts
Using only/except instead of rulesConfusing precedence, deprecatedMigrate to rules: syntax
retry: 2 on flaky testsMasks real problems, slows pipelineFix the flaky test, don't retry
GIT_STRATEGY: clone (default)Full clone every timeUse GIT_STRATEGY: fetch or GIT_DEPTH: 20
Monorepo without path filteringEvery change triggers all jobsUse rules: changes: per component

Output Format

# GitLab CI Optimization Report

## Pipeline Overview
- **Stages:** {list}  |  **Jobs:** {count}  |  **Wall Time:** {duration}
- **Critical Path:** {job-a -> job-b -> job-c}

## Findings (ranked by impact)
### 1. {Finding Title} — Impact: {High|Medium|Low}, saves ~{X} min/pipeline
- **Current:** {what it does now}
- **Recommended:** {what it should do}

## Estimated Savings
- Wall time: {X} min -> {Y} min ({Z}% reduction)
- Monthly cost: ${X} -> ${Y} (${Z} saved)
- Developer wait time saved: {hours}/day

Tips

  • Measure before optimizing — get baseline pipeline duration from GitLab's CI/CD analytics
  • Optimize the critical path first — speeding up non-critical-path jobs saves compute cost but not wall time
  • Use interruptible: true on all jobs except deploy — auto-cancels old pipelines when new commits arrive
  • Set GIT_DEPTH: 20 globally to avoid full clones (unless you need full git history)
  • Use rules:changes: in monorepos to skip unaffected jobs — this is the single biggest optimization for monorepos
  • Merge before_script into custom Docker images when the commands don't change between runs
  • Profile your scripts — add time prefix to commands to find which step is slow
  • Check artifacts:expire_in on all jobs — unlimited artifacts eat storage and slow uploads
  • Consider GitLab's parallel keyword to split test suites — 4x parallelism = ~3.5x speedup (Amdahl's law)
  • Use dependency proxy for Docker Hub images to avoid rate limiting and speed up pulls

Version tags

latestvk97exv76d9ymapc5pv0mc2xgqn85x5bd