[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1145

2026-03-04T22:23:21Z

github-actions[bot]
bot Mar 4, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and layered CI/CD setup with 15+ standard GitHub Actions workflows and 28+ agentic (AI-driven) workflows. Most standard checks run on every PR to main and PRs appear healthy overall — the most recent PR (feat/token-rate-limiting) had 22 checks run with only 1 integration test failure.

Standard Workflows on PRs:

Workflow	File	Trigger
Build Verification	`build.yml`	push + PR
Lint (ESLint)	`lint.yml`	push + PR
TypeScript Type Check	`test-integration.yml`	push + PR
Test Coverage	`test-coverage.yml`	push + PR
Integration Tests	`test-integration-suite.yml`	push + PR
Chroot Integration Tests	`test-chroot.yml`	push + PR
Examples Test	`test-examples.yml`	push + PR (path-filtered)
CodeQL Analysis	`codeql.yml`	push + PR + weekly
Container Security Scan	`container-scan.yml`	push + PR + weekly (path-filtered)
Dependency Audit	`dependency-audit.yml`	push + PR + weekly
PR Title Check	`pr-title.yml`	PR only

Agentic Workflows on PRs:

Security Guard (Claude) — AI-powered security review
Build Test × 8 (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust) — language-specific builds via Copilot agent
Smoke × 4 (Claude, Codex, Copilot, Chroot) — end-to-end smoke tests

Continuous Monitoring (daily/hourly):

Secret Digger (Claude, Codex, Copilot) — runs every hour
Dependency Security Monitor, Security Review, Doc Maintainer — daily
CLI Flag Consistency Checker, Test Coverage Improver — weekly

✅ Existing Quality Gates

Build — TypeScript compilation on Node 20 and 22 in a matrix
Linting — ESLint runs on src/ on every PR
Type checking — Strict TypeScript type check via tsconfig.check.json
Unit tests with coverage — Jest with Istanbul, PR comparison and regression detection
Integration tests — 26 test files covering domain blocking, DNS, protocol security, API proxy, chroot, container ops, credential hiding, etc.
Examples tests — 4 real awf invocations as smoke scripts
Static security analysis — CodeQL (JS/TS + Actions workflows)
Container CVE scanning — Trivy on both agent and squid containers
Dependency audit — npm audit --audit-level=high on main + docs packages
Conventional Commits enforcement — PR title semantic check with allowed scopes
AI security review — Claude-powered Security Guard reviews every PR for security regressions
Multi-language build testing — 8 language build tests ensure AWF works as a wrapper

🔍 Identified Gaps

🔴 High Priority

1. Test coverage thresholds are barely-passing minimums

The current coverage thresholds mirror the existing (low) coverage rather than enforcing a quality bar:

Statements: 38.39% (threshold: 38%) — passes by 0.39 points
Branches: 31.78% (threshold: 30%)
Functions: 37.03% (threshold: 35%)
Lines: 38.31% (threshold: 38%)

The most critical file, docker-manager.ts (container lifecycle management), likely has coverage well below 50%. Low-threshold gates give false confidence — a PR could delete half the tests and still pass.

2. No shellcheck/static analysis for shell scripts

Six shell scripts are in containers/agent/ (entrypoint.sh, setup-iptables.sh, docker-stub.sh, get-claude-key.sh, pid-logger.sh, api-proxy-health-check.sh) plus scripts/ci/cleanup.sh. These scripts are security-critical (iptables rules, capability drops, credential handling) but receive zero automated static analysis. A shell quoting bug or logic error could silently weaken the firewall.

3. Container security scan misses source-only changes

container-scan.yml has a path filter: paths: containers/**. A TypeScript change in src/docker-manager.ts that alters how containers are launched (e.g., changing security options, capability sets, or network config) will not trigger a container rescan. The scan should also run when src/** changes, as the generated Docker Compose config determines runtime security posture.

4. Misleading workflow filename causes developer confusion

test-integration.yml actually contains the TypeScript Type Check workflow (not integration tests). The actual integration tests are in test-integration-suite.yml. This naming inversion will confuse contributors trying to understand CI status or reproduce failures locally.

🟡 Medium Priority

5. No PR-blocking secret scanning gate

The secret-digger-* workflows run every hour but are not blocking PR checks — a secret committed in a PR could be merged in the ~60-minute window between scans. A dedicated gitleaks or trufflehog check running as a required PR status check would close this window.

6. No Docker image size monitoring

There is no check to detect unexpected growth in container image sizes. A PR adding a large debugging tool or accidentally installing unnecessary packages could significantly increase image sizes without any CI signal.

7. Integration tests not confirmed as required status checks

The recent feat/token-rate-limiting PR had an Integration Tests failure yet the PR appears to have proceeded. If integration tests are not configured as required branch protection status checks in repository settings, they provide warnings but not true gates.

8. No performance/timing regression detection

Container startup time and proxy latency are user-facing metrics for this project. There are no benchmarks or timing assertions — a change that doubles startup time would pass all tests.

9. Missing coverage for `docker-manager.ts` critical paths

The unit test file docker-manager.test.ts exists, but given the overall 38% coverage, the complex container lifecycle methods (startContainers, runAgentCommand, performCleanup) are likely largely untested at the unit level. The integration tests cover these indirectly, but unit-level coverage of config generation logic would catch bugs before they reach containers.

10. `api-proxy` unit tests run in build (not path-filtered)

containers/api-proxy/npm test runs in build.yml on every PR regardless of whether containers/api-proxy/ was modified, unnecessarily increasing build time for unrelated changes.

🟢 Low Priority

11. No broken link checker for documentation

The docs site (docs-site/) and Markdown files in docs/ are not checked for broken links. As the project grows and URLs change (especially internal cross-references), stale links accumulate silently.

12. No CHANGELOG/release notes enforcement

PRs don't require a CHANGELOG entry or docs/ update. This creates gaps in release history and makes it harder to understand what changed between versions.

13. No artifact size tracking for `dist/`

The compiled output in dist/ is not size-tracked. A large regression in bundle size (e.g., from an accidental large dependency) would not be caught.

14. No concurrent cleanup race condition test

The cleanup lifecycle is documented as security-critical (prevents Docker network pool exhaustion), but there is no CI test verifying cleanup handles concurrent awf invocations or SIGKILL correctly.

📋 Actionable Recommendations

1. Raise test coverage thresholds incrementally 🔴 — Medium complexity, High impact

Set a realistic target (e.g., 60% statements, 50% branches) and increase thresholds by 2-3% per sprint. Update jest.config.js coverageThreshold to reflect the actual target:

"coverageThreshold": {
  "global": {
    "statements": 42,
    "branches": 33,
    "functions": 40,
    "lines": 42
  }
}

The existing test-coverage-improver agentic workflow (weekly) can drive incremental increases.

2. Add shellcheck to CI 🔴 — Low complexity, High impact

Add a new workflow step or job to build.yml:

- name: Lint shell scripts with shellcheck
  uses: ludeeus/action-shellcheck@00cae500b08a931fb5698e11e79bfbd38e612a38
  with:
    scandir: './containers'
    additional_files: 'scripts/ci/cleanup.sh'
    severity: warning

This catches quoting issues, command injection risks, and logic errors in setup-iptables.sh and entrypoint.sh — the most security-sensitive code in the repository.

3. Remove path filter from container scan 🔴 — Low complexity, High impact

In container-scan.yml, add src/** and package.json to the paths filter so that source changes that affect container configuration also trigger a rescan:

paths:
  - 'containers/**'
  - 'src/**'
  - 'package.json'
  - '.github/workflows/container-scan.yml'

4. Rename misleading workflow file 🔴 — Low complexity, Low impact

Rename test-integration.yml → type-check.yml to reflect its actual content ("TypeScript Type Check"). Update CI Doctor's workflow list accordingly.

5. Add gitleaks as a blocking PR check 🟡 — Low complexity, High impact

Add a dedicated secret scanning step using gitleaks/gitleaks-action as a required status check, separate from the hourly agentic scans:

- name: Run Gitleaks
  uses: gitleaks/gitleaks-action@ff98106e4c7b2bc89cf8b6c53f2a3c3f06c5f41
  env:
    GITHUB_TOKEN: $\{\{ secrets.GITHUB_TOKEN }}

6. Verify and enforce required status checks 🟡 — Low complexity, High impact

In repository Settings → Branches → Branch protection rules for main, ensure these workflows are required status checks: Build Verification, Integration Tests, Chroot Integration Tests, TypeScript Type Check, Test Coverage, Lint, CodeQL, PR Title Check. If Integration Tests is not required, the recent failure on feat/token-rate-limiting would not have blocked merging.

7. Add Docker image size monitoring 🟡 — Medium complexity, Medium impact

Add a step to build.yml or container-scan.yml that records image sizes and fails if they exceed a threshold:

AGENT_SIZE=$(docker image inspect ghcr.io/github/gh-aw-firewall/agent:latest --format='\{\{.Size}}')
MAX_SIZE=$((500 * 1024 * 1024))  # 500MB
if [ "$AGENT_SIZE" -gt "$MAX_SIZE" ]; then
  echo "::error::Agent image size \$\{AGENT_SIZE} bytes exceeds maximum \$\{MAX_SIZE} bytes"
  exit 1
fi

8. Add path filter to api-proxy tests in build.yml 🟢 — Low complexity, Low impact

Move the api-proxy unit test step to a separate job with paths: ['containers/api-proxy/**'] to avoid running it on unrelated changes.

9. Add documentation link checker 🟢 — Low complexity, Low impact

Add lychee-action or markdown-link-check to the docs deploy workflow to verify links before deployment.

10. Add performance baseline tracking 🟡 — High complexity, Medium impact

Add a benchmark job using hyperfine that measures awf startup-to-first-output time and compares against a baseline stored as a repository artifact. Flag regressions >10% in the PR comment.

📈 Metrics Summary

Metric	Value
Total workflows	56 (15 standard + 28+ agentic + 13 lock files)
Workflows triggering on PR	~24 (11 standard + 13 agentic)
Integration test files	26 files across domains, security, chroot, API proxy, container ops
Unit test files	10 files in `src/`
Statement coverage	38.39% (threshold: 38%)
Branch coverage	31.78% (threshold: 30%)
Function coverage	37.03% (threshold: 35%)
Line coverage	38.31% (threshold: 38%)
Recent PR check pass rate	~95% (1 Integration Tests failure out of ~22 checks on most recent PR)
Security scanning layers	CodeQL + Trivy + npm audit + hourly AI secret diggers + AI Security Guard

Key Finding

The pipeline is comprehensive in breadth — covering security, builds, integration, docs, and multiple AI engines — but has meaningful gaps in depth: test coverage is low, shell scripts are unguarded, and some critical checks (container scan, integration tests as required checks) have configuration gaps. The most impactful quick wins are shellcheck for shell scripts and raising coverage thresholds incrementally.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 11, 2026, 10:23 PM UTC

2026-03-05T00:58:36Z

github-actions[bot]
bot Mar 5, 2026
Author

🔮 The ancient spirits stir; the smoke test agent has passed through, leaving a quiet omen in its wake.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1145

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1145

Uh oh!

github-actions[bot] bot Mar 4, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

1. Test coverage thresholds are barely-passing minimums

2. No shellcheck/static analysis for shell scripts

3. Container security scan misses source-only changes

4. Misleading workflow filename causes developer confusion

🟡 Medium Priority

5. No PR-blocking secret scanning gate

6. No Docker image size monitoring

7. Integration tests not confirmed as required status checks

8. No performance/timing regression detection

9. Missing coverage for docker-manager.ts critical paths

10. api-proxy unit tests run in build (not path-filtered)

🟢 Low Priority

11. No broken link checker for documentation

12. No CHANGELOG/release notes enforcement

13. No artifact size tracking for dist/

14. No concurrent cleanup race condition test

📋 Actionable Recommendations

1. Raise test coverage thresholds incrementally 🔴 — Medium complexity, High impact

2. Add shellcheck to CI 🔴 — Low complexity, High impact

3. Remove path filter from container scan 🔴 — Low complexity, High impact

4. Rename misleading workflow file 🔴 — Low complexity, Low impact

5. Add gitleaks as a blocking PR check 🟡 — Low complexity, High impact

6. Verify and enforce required status checks 🟡 — Low complexity, High impact

7. Add Docker image size monitoring 🟡 — Medium complexity, Medium impact

8. Add path filter to api-proxy tests in build.yml 🟢 — Low complexity, Low impact

9. Add documentation link checker 🟢 — Low complexity, Low impact

10. Add performance baseline tracking 🟡 — High complexity, Medium impact

📈 Metrics Summary

Key Finding

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 5, 2026 Author

github-actions[bot]
bot Mar 4, 2026

9. Missing coverage for `docker-manager.ts` critical paths

10. `api-proxy` unit tests run in build (not path-filtered)

13. No artifact size tracking for `dist/`

github-actions[bot]
bot Mar 5, 2026
Author