name: ci-optimization-specialist description: Optimizes GitHub Actions CI/CD workflows through test sharding, intelligent caching, and workflow parallelization. Use when CI execution time exceeds limits, costs are too high, or workflows need parallelization.
CI Optimization Specialist
Quick Start
This skill optimizes GitHub Actions workflows for:
- Test sharding: Parallel test execution across multiple runners
- Caching: pnpm store, Playwright browsers, Vite build cache
- Workflow optimization: Job dependencies and concurrency
When to Use
- CI execution time exceeds 10-15 minutes
- GitHub Actions costs too high
- Need faster developer feedback loops
- Tests not parallelized
Test Sharding Setup
Basic Pattern (Automatic Distribution)
Add matrix strategy to .github/workflows/ci.yml:
e2e-tests:
name: 🧪 E2E Tests [Shard ${{ matrix.shard }}/3]
runs-on: ubuntu-latest
timeout-minutes: 30
strategy:
fail-fast: false
matrix:
shard: [1, 2, 3]
steps:
- name: Run Playwright tests
run: pnpm exec playwright test --shard=${{ matrix.shard }}/3
env:
CI: true
Expected improvement: 60-65% faster for 3 shards
Advanced Pattern (Manual Distribution)
For unbalanced test suites, manually distribute by duration:
matrix:
include:
- shard: 1
pattern: 'ai-generation|project-management' # Heavy tests
- shard: 2
pattern: 'project-wizard|settings|publishing' # Medium tests
- shard: 3
pattern: 'world-building|versioning|mock-validation' # Light tests
# In step:
run: pnpm exec playwright test --grep "${{ matrix.pattern }}"
Critical Caching Patterns
pnpm Store Cache
ALWAYS cache pnpm store to avoid re-downloading packages:
- name: Get pnpm store directory
id: pnpm-cache
shell: bash
run: echo "STORE_PATH=$(pnpm store path)" >> $GITHUB_OUTPUT
- name: Setup pnpm cache
uses: actions/cache@v4
with:
path: ${{ steps.pnpm-cache.outputs.STORE_PATH }}
key: ${{ runner.os }}-pnpm-store-${{ hashFiles('**/pnpm-lock.yaml') }}
restore-keys: |
${{ runner.os }}-pnpm-store-
Playwright Browsers Cache
Cache 500MB+ browser binaries:
- name: Cache Playwright browsers
uses: actions/cache@v4
id: playwright-cache
with:
path: ~/.cache/ms-playwright
key: ${{ runner.os }}-playwright-${{ hashFiles('**/pnpm-lock.yaml') }}
- name: Install Playwright browsers
if: steps.playwright-cache.outputs.cache-hit != 'true'
run: pnpm exec playwright install --with-deps chromium
- name: Install Playwright system dependencies
if: steps.playwright-cache.outputs.cache-hit == 'true'
run: pnpm exec playwright install-deps chromium
Vite Build Cache
For monorepos or frequent builds:
- name: Cache Vite build
uses: actions/cache@v4
with:
path: |
dist/
node_modules/.vite/
key: ${{ runner.os }}-vite-${{ hashFiles('src/**', 'vite.config.ts') }}
Workflow Optimization
Job Dependencies
Use needs to control execution flow:
jobs:
build-and-test:
runs-on: ubuntu-latest
steps:
- name: Build
run: pnpm run build
- name: Run unit tests
run: pnpm test
e2e-tests:
needs: build-and-test # Wait for build to complete
runs-on: ubuntu-latest
strategy:
matrix:
shard: [1, 2, 3]
steps:
- name: Run E2E tests
run: pnpm exec playwright test --shard=${{ matrix.shard }}/3
Concurrency Control
Prevent multiple runs on same branch:
concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
Artifact Management
Per-Shard Artifacts
Upload test reports from each shard:
- name: Upload Playwright report
if: always()
uses: actions/upload-artifact@v4
with:
name: playwright-report-shard-${{ matrix.shard }}-${{ github.sha }}
path: playwright-report/
retention-days: 7
compression-level: 6
Artifact Cleanup
Set short retention for test reports to reduce storage costs:
retention-days: 7 # Default is 90 days
compression-level: 6 # Compress to reduce storage
Performance Monitoring
Expected Benchmarks
| Optimization | Before | After | Improvement |
|---|---|---|---|
| Test sharding (3 shards) | 27 min | 9-10 min | 60-65% |
| pnpm cache hit | 2-3 min | 10-15s | 85-90% |
| Playwright cache hit | 1-2 min | 5-10s | 90-95% |
| Vite build cache | 1-2 min | 5-10s | 90-95% |
Regression Detection
Set timeout thresholds as guardrails:
timeout-minutes: 30 # Fail if shard exceeds 30 minutes
Monitor shard execution times and rebalance if one shard consistently exceeds others by >2 minutes.
Optimization Workflow
Phase 1: Baseline
- Record current CI execution times
- Identify slowest jobs
- Measure cache hit rates (check Actions logs)
Phase 2: Implement Caching
- Add pnpm store cache (highest impact)
- Add Playwright browser cache
- Add build caches if applicable
- Verify cache keys work correctly
Phase 3: Implement Sharding
- Calculate optimal shard count (target 3-5 min per shard)
- Add matrix strategy to workflow
- Test locally:
playwright test --shard=1/3 - Monitor shard balance in CI
Phase 4: Monitor & Adjust
- Track execution times over 5-10 runs
- Identify unbalanced shards (>2 min variance)
- Adjust shard distribution if needed
- Set up alerts for regressions
Common Issues
Shard imbalance (one shard takes 2x longer)
- Use manual distribution with
--greppatterns - Group heavy tests together, distribute across shards
Cache misses despite correct key
- Verify
hashFilesglob patterns match actual files - Check if lock file changes on every run (shouldn't happen)
Playwright install fails with cache hit
- Ensure system dependencies installed separately:
playwright install-deps
Tests fail in CI but pass locally
- Check environment variables (CI=true may affect behavior)
- Verify mock setup works in parallel execution
- Increase timeouts for slow operations
Success Criteria
- CI execution time < 15 minutes total
- Cache hit rate > 85% for dependencies
- Shard execution time variance < 2 minutes
- Zero timeout failures from slow tests
References
For detailed examples and templates:
- GitHub Actions Caching: https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
- Playwright Sharding: https://playwright.dev/docs/test-sharding
- pnpm in CI: https://pnpm.io/continuous-integration