[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1178

2026-03-08T22:20:23Z

github-actions[bot]
bot Mar 8, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature, multi-layered CI/CD pipeline with 57 registered workflows combining static YAML-based checks and agentic AI-driven workflows. The pipeline covers build verification, type checking, linting, unit tests, integration tests, security scanning, and end-to-end smoke tests.

Workflow inventory (March 2026):

Category	Count	Runs on PR?
Static quality gates (build, lint, type-check, test)	7	✅ Every PR
Security workflows (CodeQL, Trivy, dependency audit)	3	✅ Every PR
Integration tests (domain/network, protocol, container, chroot)	3	✅ Every PR
Agentic AI workflows (smoke, build-test, security guard)	15+	✅ Every PR or scheduled
Maintenance & monitoring (secret-digger, doc-maintainer, etc.)	8+	⏰ Scheduled
Utility (release, deploy-docs, pr-title)	5	✅ / On event

Health summary (recent runs, March 2026): Scheduled maintenance workflows (Secret Digger, Agentic Maintenance) are running consistently with a ~95% success rate.

✅ Existing Quality Gates

The following checks currently run on every PR targeting main:

Build & Compilation

Build Verification (build.yml) — TypeScript compilation on Node 20 and 22 (matrix), plus npm run lint and API proxy unit tests (containers/api-proxy/)
TypeScript Type Check (test-integration.yml) — strict tsc --noEmit type checking
ESLint (lint.yml) — TypeScript/JavaScript linting with custom rules

Testing

Test Coverage (test-coverage.yml) — Unit tests with coverage enforcement (thresholds: 38% lines/statements, 30% branches, 35% functions); compares against base branch and comments on PR; fails on regression
Integration Tests Suite (test-integration-suite.yml) — 4 parallel jobs covering: domain/network filtering, protocol/security tests, container/ops tests, API proxy tests
Chroot Integration Tests (test-chroot.yml) — Language runtime tests (Python, Go, Java, .NET, Node) through chroot sandbox
Examples Test (test-examples.yml) — End-to-end execution of shell examples
Test Setup Action (test-action.yml) — Validates GitHub Action installer (latest, specific version, image pull, invalid version)

Security

CodeQL Analysis (codeql.yml) — SAST for JavaScript/TypeScript and GitHub Actions workflows (runs on PR + weekly schedule)
Dependency Vulnerability Audit (dependency-audit.yml) — npm audit for main and docs-site packages; SARIF uploaded to Security tab; blocks on high/critical
Container Security Scan (container-scan.yml) — Trivy scan on agent and squid images (SARIF to Security tab) — path-filtered to containers/**
Security Guard (security-guard.lock.yml) — AI-driven (Claude) security review on every PR; comments with findings

Process

PR Title Check (pr-title.yml) — Enforces Conventional Commits format with allowed scopes (cli, docker, squid, proxy, ci, deps)

Agentic / Smoke (PR-triggered, may require reactions)

Smoke Claude/Codex/Copilot — AI agent smoke tests validating full AWF execution pipeline with real models
Build-Test (8 ecosystems) — Bun, C++, Deno, .NET, Go, Java, Node.js, Rust end-to-end build+test through AWF proxy

🔍 Identified Gaps

🔴 High Priority

1. Coverage thresholds are critically low for a security-critical tool

Current thresholds: 38% lines, 30% branches, 35% functions. The two most important files have near-zero coverage:

cli.ts (entry point): 0% coverage (0/69 lines)
docker-manager.ts (core orchestration): 18% coverage (45/250 lines, 4% function coverage)

These are the files most likely to have regressions from PRs, and they are effectively untested by the unit test suite.

Risk: Breaking changes to CLI argument parsing, container lifecycle management, or cleanup logic can ship undetected by unit tests, relying entirely on integration tests (which are slower and less targeted).

2. Container Security Scan is path-filtered (skipped on most PRs)

container-scan.yml only triggers when containers/** or the workflow file itself changes:

paths:
  - 'containers/**'
  - '.github/workflows/container-scan.yml'

A PR that modifies src/docker-manager.ts to change how containers are configured or what capabilities they receive will not trigger a Trivy scan. Only the weekly scheduled run catches this.

Risk: Security-relevant changes to container configuration ship without a fresh container image scan.

3. No shell script linting (ShellCheck)

There are 6 shell scripts in containers/agent/ (including setup-iptables.sh, entrypoint.sh) that are critical security components. None are checked by any automated linter. No shellcheck or shfmt step exists in any workflow.

Risk: Shell script bugs (quoting issues, unhandled errors, logic errors in iptables setup) are only caught by integration tests or manual review.

4. No minimum coverage gate that scales with the codebase

The coverage comparison check prevents regression but the absolute thresholds (38%/30%) are far below industry best practices for security-critical infrastructure (typically 70-80%+). A PR that adds 100 lines with 38% coverage technically passes all gates while leaving 62% of new logic untested.

🟡 Medium Priority

5. No code formatting check (Prettier)

The repository has ESLint but no Prettier or other formatter enforcement. TypeScript/JavaScript formatting is only enforced by ESLint stylistic rules (if any), and contributors may submit PRs with inconsistent formatting that creates noisy diffs.

Impact: Minor but affects review quality and diff readability over time.

6. `build.yml` combines lint and build — lint failure blocks build result

The Build Verification workflow runs npm run lint before npm run build. If lint fails, the build step is never executed, making it impossible to distinguish "lint failure" from "build failure" in the status check list. Lint already has its own dedicated workflow (lint.yml).

Impact: Duplicate lint runs; failure category ambiguity in PR status.

7. No dist artifact size monitoring

The dist/ directory is compiled TypeScript output. There is no check that prevents unintentional bundle size growth (e.g., accidentally bundling large dependencies, adding debug code).

Impact: Could cause slow installs or unexpected behavior for users of the npm package.

8. Integration tests do not run when only `src/**` changes (smoke-chroot)

smoke-chroot.md has paths: [src/**, containers/**, ...] path filtering — this is correct. However, the main Integration Tests Suite (test-integration-suite.yml) runs on all PRs regardless of what changed, including pure documentation changes. Conversely, the chroot tests (which test the --chroot-dir flag) are not triggered by changes to non-container source files unless they match the path filter.

Impact: Integration test runs are sometimes redundant; some source changes may miss targeted test coverage.

9. No license compliance check

The project uses MIT license and depends on many npm packages. There is no automated check that new dependencies don't introduce incompatible licenses (e.g., GPL/AGPL that could affect distribution).

Impact: A dependency with an incompatible license could be introduced unknowingly.

10. Agentic smoke tests and build-test workflows are non-blocking

The smoke tests (Claude/Codex/Copilot) and build-test agentic workflows run on PRs but are not enforced as required status checks. They are valuable for validation but their results are informational only unless configured as required in branch protection.

Impact: A PR that breaks the Claude smoke test can still be merged if the required static checks pass.

🟢 Low Priority

11. No mutation testing

Unit tests exist for logger.ts, squid-config.ts, and cli-workflow.ts (all at 100% coverage), but no mutation testing (e.g., Stryker) validates that the tests are actually detecting bugs rather than just executing code paths.

Impact: Test quality is unknown beyond line coverage metrics.

12. No documentation link validation

There are many cross-references between README.md, docs/, and AGENTS.md. No workflow checks that internal links are valid.

Impact: Documentation rot and broken links go undetected.

13. Weekly scheduled scans are not linked back to PRs

CodeQL, Trivy container scan, and dependency audit all run on a weekly/monthly schedule. When they find new CVEs, there is no automated mechanism to open an issue or PR. The dependency-security-monitor.md agentic workflow exists for this, but it's separate from the scheduled static scans.

Impact: CVE notifications from scheduled scans may not result in timely remediation.

📋 Actionable Recommendations

1. Add ShellCheck to CI (High Priority)

Issue: 6 critical shell scripts have no linting.
Solution: Add a shellcheck step to build.yml or a dedicated lint-scripts.yml:

- name: Run ShellCheck
  uses: ludeeus/action-shellcheck@00cae500b08a931fb5698e11e79bfbd38e612a38
  with:
    scandir: './containers'

Complexity: Low | Impact: High — catches iptables setup bugs before runtime

2. Remove path filter from Container Scan (High Priority)

Issue: Container scan skips most PRs.
Solution: Remove paths: filter from container-scan.yml so it runs on every PR, or add src/** to the paths list since source changes can affect container behavior.
Complexity: Low | Impact: High — ensures every PR is scanned for container CVEs

3. Raise coverage thresholds incrementally (High Priority)

Issue: Thresholds are too low (38%) for a security-critical tool.
Solution: Set a roadmap: 50% by Q2, 65% by Q3, 80% by Q4. Increase thresholds by 5% each quarter as the test-coverage-improver agentic workflow generates new tests. Prioritize cli.ts and docker-manager.ts.
Complexity: Medium | Impact: High — forces test-writing alongside feature development

4. Add Prettier formatting check (Medium Priority)

Issue: No code formatting enforcement.
Solution: Add .prettierrc and a format-check step: npx prettier --check "src/**/*.ts". Can be added to lint.yml.
Complexity: Low | Impact: Medium — improves code consistency

5. Separate lint from build in `build.yml` (Medium Priority)

Issue: Lint blocks build step in Build Verification.
Solution: Remove npm run lint from build.yml since lint.yml already covers it. Reduces duplication and gives clearer failure signals.
Complexity: Low | Impact: Low-Medium — cleaner status checks

6. Add dist size tracking (Medium Priority)

Issue: No artifact size monitoring.
Solution: Add a step in build.yml that prints and compares dist/ size, failing if it grows by >20%:

DIST_SIZE=$(du -sk dist/ | cut -f1)
echo "dist size: \$\{DIST_SIZE}KB"

Complexity: Low | Impact: Medium — prevents accidental bundle bloat

7. Add license compliance check (Low Priority)

Issue: No license scanning.
Solution: Add npx license-checker --onlyAllow "MIT;BSD-2-Clause;BSD-3-Clause;ISC;Apache-2.0;CC0-1.0" as a step in the dependency audit workflow.
Complexity: Low | Impact: Medium — prevents license compliance issues

8. Enforce smoke/build-test as required status checks (Medium Priority)

Issue: AI-driven tests are advisory, not blocking.
Solution: Configure the smoke-claude, smoke-copilot, smoke-codex results as required checks in branch protection settings (or accept the risk explicitly by documenting the policy).
Complexity: Low (config change only) | Impact: High — ensures AWF actually works with all 3 engines before merging

9. Add documentation link validation (Low Priority)

Issue: Broken internal links go undetected.
Solution: Use lychee or markdown-link-check in a scheduled or PR workflow targeting **/*.md files.
Complexity: Low | Impact: Low-Medium — improves documentation quality

📈 Metrics Summary

Metric	Value
Total workflows registered	57
Workflows running on every PR	~15
Agentic workflows (AI-driven)	~27
Unit test coverage (statements)	38.39%
Unit test coverage (branches)	31.78%
Coverage threshold (lines)	38% ⚠️ Low
`cli.ts` coverage	0% ❌
`docker-manager.ts` coverage	18% ❌
Integration test files	26
Integration test count (approx)	~265
Shell scripts without linting	6
Container scan path-filter gap	Yes ⚠️
Scheduled scan → auto-issue creation	Partial (agentic monitor only)

Top 3 priorities:

🔴 Add ShellCheck for 6 critical container scripts
🔴 Remove path filter from container security scan
🔴 Raise coverage thresholds with a concrete quarterly roadmap

Generated by CI/CD Gaps Assessment workflow — run #22831070902 on 2026-03-08

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 15, 2026, 10:20 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1178

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1178

Uh oh!

github-actions[bot] bot Mar 8, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Build & Compilation

Testing

Security

Process

Agentic / Smoke (PR-triggered, may require reactions)

🔍 Identified Gaps

🔴 High Priority

1. Coverage thresholds are critically low for a security-critical tool

2. Container Security Scan is path-filtered (skipped on most PRs)

3. No shell script linting (ShellCheck)

4. No minimum coverage gate that scales with the codebase

🟡 Medium Priority

5. No code formatting check (Prettier)

6. build.yml combines lint and build — lint failure blocks build result

7. No dist artifact size monitoring

8. Integration tests do not run when only src/** changes (smoke-chroot)

9. No license compliance check

10. Agentic smoke tests and build-test workflows are non-blocking

🟢 Low Priority

11. No mutation testing

12. No documentation link validation

13. Weekly scheduled scans are not linked back to PRs

📋 Actionable Recommendations

1. Add ShellCheck to CI (High Priority)

2. Remove path filter from Container Scan (High Priority)

3. Raise coverage thresholds incrementally (High Priority)

4. Add Prettier formatting check (Medium Priority)

5. Separate lint from build in build.yml (Medium Priority)

6. Add dist size tracking (Medium Priority)

7. Add license compliance check (Low Priority)

8. Enforce smoke/build-test as required status checks (Medium Priority)

9. Add documentation link validation (Low Priority)

📈 Metrics Summary

Replies: 0 comments

github-actions[bot]
bot Mar 8, 2026

6. `build.yml` combines lint and build — lint failure blocks build result

8. Integration tests do not run when only `src/**` changes (smoke-chroot)

5. Separate lint from build in `build.yml` (Medium Priority)