You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The repository has a mature, multi-layered CI/CD pipeline with 57 registered workflows combining static YAML-based checks and agentic AI-driven workflows. The pipeline covers build verification, type checking, linting, unit tests, integration tests, security scanning, and end-to-end smoke tests.
Health summary (recent runs, March 2026): Scheduled maintenance workflows (Secret Digger, Agentic Maintenance) are running consistently with a ~95% success rate.
✅ Existing Quality Gates
The following checks currently run on every PR targeting main:
Build & Compilation
Build Verification (build.yml) — TypeScript compilation on Node 20 and 22 (matrix), plus npm run lint and API proxy unit tests (containers/api-proxy/)
TypeScript Type Check (test-integration.yml) — strict tsc --noEmit type checking
ESLint (lint.yml) — TypeScript/JavaScript linting with custom rules
Testing
Test Coverage (test-coverage.yml) — Unit tests with coverage enforcement (thresholds: 38% lines/statements, 30% branches, 35% functions); compares against base branch and comments on PR; fails on regression
Integration Tests Suite (test-integration-suite.yml) — 4 parallel jobs covering: domain/network filtering, protocol/security tests, container/ops tests, API proxy tests
Chroot Integration Tests (test-chroot.yml) — Language runtime tests (Python, Go, Java, .NET, Node) through chroot sandbox
Examples Test (test-examples.yml) — End-to-end execution of shell examples
Test Setup Action (test-action.yml) — Validates GitHub Action installer (latest, specific version, image pull, invalid version)
Security
CodeQL Analysis (codeql.yml) — SAST for JavaScript/TypeScript and GitHub Actions workflows (runs on PR + weekly schedule)
Dependency Vulnerability Audit (dependency-audit.yml) — npm audit for main and docs-site packages; SARIF uploaded to Security tab; blocks on high/critical
Container Security Scan (container-scan.yml) — Trivy scan on agent and squid images (SARIF to Security tab) — path-filtered to containers/**
Security Guard (security-guard.lock.yml) — AI-driven (Claude) security review on every PR; comments with findings
Process
PR Title Check (pr-title.yml) — Enforces Conventional Commits format with allowed scopes (cli, docker, squid, proxy, ci, deps)
Agentic / Smoke (PR-triggered, may require reactions)
Smoke Claude/Codex/Copilot — AI agent smoke tests validating full AWF execution pipeline with real models
1. Coverage thresholds are critically low for a security-critical tool
Current thresholds: 38% lines, 30% branches, 35% functions. The two most important files have near-zero coverage:
cli.ts (entry point): 0% coverage (0/69 lines)
docker-manager.ts (core orchestration): 18% coverage (45/250 lines, 4% function coverage)
These are the files most likely to have regressions from PRs, and they are effectively untested by the unit test suite.
Risk: Breaking changes to CLI argument parsing, container lifecycle management, or cleanup logic can ship undetected by unit tests, relying entirely on integration tests (which are slower and less targeted).
2. Container Security Scan is path-filtered (skipped on most PRs)
container-scan.yml only triggers when containers/** or the workflow file itself changes:
A PR that modifies src/docker-manager.ts to change how containers are configured or what capabilities they receive will not trigger a Trivy scan. Only the weekly scheduled run catches this.
Risk: Security-relevant changes to container configuration ship without a fresh container image scan.
3. No shell script linting (ShellCheck)
There are 6 shell scripts in containers/agent/ (including setup-iptables.sh, entrypoint.sh) that are critical security components. None are checked by any automated linter. No shellcheck or shfmt step exists in any workflow.
Risk: Shell script bugs (quoting issues, unhandled errors, logic errors in iptables setup) are only caught by integration tests or manual review.
4. No minimum coverage gate that scales with the codebase
The coverage comparison check prevents regression but the absolute thresholds (38%/30%) are far below industry best practices for security-critical infrastructure (typically 70-80%+). A PR that adds 100 lines with 38% coverage technically passes all gates while leaving 62% of new logic untested.
🟡 Medium Priority
5. No code formatting check (Prettier)
The repository has ESLint but no Prettier or other formatter enforcement. TypeScript/JavaScript formatting is only enforced by ESLint stylistic rules (if any), and contributors may submit PRs with inconsistent formatting that creates noisy diffs.
Impact: Minor but affects review quality and diff readability over time.
6. build.yml combines lint and build — lint failure blocks build result
The Build Verification workflow runs npm run lint before npm run build. If lint fails, the build step is never executed, making it impossible to distinguish "lint failure" from "build failure" in the status check list. Lint already has its own dedicated workflow (lint.yml).
Impact: Duplicate lint runs; failure category ambiguity in PR status.
7. No dist artifact size monitoring
The dist/ directory is compiled TypeScript output. There is no check that prevents unintentional bundle size growth (e.g., accidentally bundling large dependencies, adding debug code).
Impact: Could cause slow installs or unexpected behavior for users of the npm package.
8. Integration tests do not run when only src/** changes (smoke-chroot)
smoke-chroot.md has paths: [src/**, containers/**, ...] path filtering — this is correct. However, the main Integration Tests Suite (test-integration-suite.yml) runs on all PRs regardless of what changed, including pure documentation changes. Conversely, the chroot tests (which test the --chroot-dir flag) are not triggered by changes to non-container source files unless they match the path filter.
Impact: Integration test runs are sometimes redundant; some source changes may miss targeted test coverage.
9. No license compliance check
The project uses MIT license and depends on many npm packages. There is no automated check that new dependencies don't introduce incompatible licenses (e.g., GPL/AGPL that could affect distribution).
Impact: A dependency with an incompatible license could be introduced unknowingly.
10. Agentic smoke tests and build-test workflows are non-blocking
The smoke tests (Claude/Codex/Copilot) and build-test agentic workflows run on PRs but are not enforced as required status checks. They are valuable for validation but their results are informational only unless configured as required in branch protection.
Impact: A PR that breaks the Claude smoke test can still be merged if the required static checks pass.
🟢 Low Priority
11. No mutation testing
Unit tests exist for logger.ts, squid-config.ts, and cli-workflow.ts (all at 100% coverage), but no mutation testing (e.g., Stryker) validates that the tests are actually detecting bugs rather than just executing code paths.
Impact: Test quality is unknown beyond line coverage metrics.
12. No documentation link validation
There are many cross-references between README.md, docs/, and AGENTS.md. No workflow checks that internal links are valid.
Impact: Documentation rot and broken links go undetected.
13. Weekly scheduled scans are not linked back to PRs
CodeQL, Trivy container scan, and dependency audit all run on a weekly/monthly schedule. When they find new CVEs, there is no automated mechanism to open an issue or PR. The dependency-security-monitor.md agentic workflow exists for this, but it's separate from the scheduled static scans.
Impact: CVE notifications from scheduled scans may not result in timely remediation.
📋 Actionable Recommendations
1. Add ShellCheck to CI (High Priority)
Issue: 6 critical shell scripts have no linting. Solution: Add a shellcheck step to build.yml or a dedicated lint-scripts.yml:
- name: Run ShellCheckuses: ludeeus/action-shellcheck@00cae500b08a931fb5698e11e79bfbd38e612a38with:
scandir: './containers'
Complexity: Low | Impact: High — catches iptables setup bugs before runtime
2. Remove path filter from Container Scan (High Priority)
Issue: Container scan skips most PRs. Solution: Remove paths: filter from container-scan.yml so it runs on every PR, or add src/** to the paths list since source changes can affect container behavior. Complexity: Low | Impact: High — ensures every PR is scanned for container CVEs
Issue: Thresholds are too low (38%) for a security-critical tool. Solution: Set a roadmap: 50% by Q2, 65% by Q3, 80% by Q4. Increase thresholds by 5% each quarter as the test-coverage-improver agentic workflow generates new tests. Prioritize cli.ts and docker-manager.ts. Complexity: Medium | Impact: High — forces test-writing alongside feature development
Issue: No code formatting enforcement. Solution: Add .prettierrc and a format-check step: npx prettier --check "src/**/*.ts". Can be added to lint.yml. Complexity: Low | Impact: Medium — improves code consistency
5. Separate lint from build in build.yml (Medium Priority)
Issue: Lint blocks build step in Build Verification. Solution: Remove npm run lint from build.yml since lint.yml already covers it. Reduces duplication and gives clearer failure signals. Complexity: Low | Impact: Low-Medium — cleaner status checks
6. Add dist size tracking (Medium Priority)
Issue: No artifact size monitoring. Solution: Add a step in build.yml that prints and compares dist/ size, failing if it grows by >20%:
Complexity: Low | Impact: Medium — prevents accidental bundle bloat
7. Add license compliance check (Low Priority)
Issue: No license scanning. Solution: Add npx license-checker --onlyAllow "MIT;BSD-2-Clause;BSD-3-Clause;ISC;Apache-2.0;CC0-1.0" as a step in the dependency audit workflow. Complexity: Low | Impact: Medium — prevents license compliance issues
8. Enforce smoke/build-test as required status checks (Medium Priority)
Issue: AI-driven tests are advisory, not blocking. Solution: Configure the smoke-claude, smoke-copilot, smoke-codex results as required checks in branch protection settings (or accept the risk explicitly by documenting the policy). Complexity: Low (config change only) | Impact: High — ensures AWF actually works with all 3 engines before merging
9. Add documentation link validation (Low Priority)
Issue: Broken internal links go undetected. Solution: Use lychee or markdown-link-check in a scheduled or PR workflow targeting **/*.md files. Complexity: Low | Impact: Low-Medium — improves documentation quality
📈 Metrics Summary
Metric
Value
Total workflows registered
57
Workflows running on every PR
~15
Agentic workflows (AI-driven)
~27
Unit test coverage (statements)
38.39%
Unit test coverage (branches)
31.78%
Coverage threshold (lines)
38% ⚠️ Low
cli.ts coverage
0% ❌
docker-manager.ts coverage
18% ❌
Integration test files
26
Integration test count (approx)
~265
Shell scripts without linting
6
Container scan path-filter gap
Yes ⚠️
Scheduled scan → auto-issue creation
Partial (agentic monitor only)
Top 3 priorities:
🔴 Add ShellCheck for 6 critical container scripts
🔴 Remove path filter from container security scan
🔴 Raise coverage thresholds with a concrete quarterly roadmap
Generated by CI/CD Gaps Assessment workflow — run #22831070902 on 2026-03-08
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature, multi-layered CI/CD pipeline with 57 registered workflows combining static YAML-based checks and agentic AI-driven workflows. The pipeline covers build verification, type checking, linting, unit tests, integration tests, security scanning, and end-to-end smoke tests.
Workflow inventory (March 2026):
Health summary (recent runs, March 2026): Scheduled maintenance workflows (Secret Digger, Agentic Maintenance) are running consistently with a ~95% success rate.
✅ Existing Quality Gates
The following checks currently run on every PR targeting
main:Build & Compilation
build.yml) — TypeScript compilation on Node 20 and 22 (matrix), plusnpm run lintand API proxy unit tests (containers/api-proxy/)test-integration.yml) — stricttsc --noEmittype checkinglint.yml) — TypeScript/JavaScript linting with custom rulesTesting
test-coverage.yml) — Unit tests with coverage enforcement (thresholds: 38% lines/statements, 30% branches, 35% functions); compares against base branch and comments on PR; fails on regressiontest-integration-suite.yml) — 4 parallel jobs covering: domain/network filtering, protocol/security tests, container/ops tests, API proxy teststest-chroot.yml) — Language runtime tests (Python, Go, Java, .NET, Node) through chroot sandboxtest-examples.yml) — End-to-end execution of shell examplestest-action.yml) — Validates GitHub Action installer (latest, specific version, image pull, invalid version)Security
codeql.yml) — SAST for JavaScript/TypeScript and GitHub Actions workflows (runs on PR + weekly schedule)dependency-audit.yml) —npm auditfor main and docs-site packages; SARIF uploaded to Security tab; blocks onhigh/criticalcontainer-scan.yml) — Trivy scan on agent and squid images (SARIF to Security tab) — path-filtered tocontainers/**security-guard.lock.yml) — AI-driven (Claude) security review on every PR; comments with findingsProcess
pr-title.yml) — Enforces Conventional Commits format with allowed scopes (cli,docker,squid,proxy,ci,deps)Agentic / Smoke (PR-triggered, may require reactions)
🔍 Identified Gaps
🔴 High Priority
1. Coverage thresholds are critically low for a security-critical tool
Current thresholds: 38% lines, 30% branches, 35% functions. The two most important files have near-zero coverage:
cli.ts(entry point): 0% coverage (0/69 lines)docker-manager.ts(core orchestration): 18% coverage (45/250 lines, 4% function coverage)These are the files most likely to have regressions from PRs, and they are effectively untested by the unit test suite.
Risk: Breaking changes to CLI argument parsing, container lifecycle management, or cleanup logic can ship undetected by unit tests, relying entirely on integration tests (which are slower and less targeted).
2. Container Security Scan is path-filtered (skipped on most PRs)
container-scan.ymlonly triggers whencontainers/**or the workflow file itself changes:A PR that modifies
src/docker-manager.tsto change how containers are configured or what capabilities they receive will not trigger a Trivy scan. Only the weekly scheduled run catches this.Risk: Security-relevant changes to container configuration ship without a fresh container image scan.
3. No shell script linting (ShellCheck)
There are 6 shell scripts in
containers/agent/(includingsetup-iptables.sh,entrypoint.sh) that are critical security components. None are checked by any automated linter. Noshellcheckorshfmtstep exists in any workflow.Risk: Shell script bugs (quoting issues, unhandled errors, logic errors in iptables setup) are only caught by integration tests or manual review.
4. No minimum coverage gate that scales with the codebase
The coverage comparison check prevents regression but the absolute thresholds (38%/30%) are far below industry best practices for security-critical infrastructure (typically 70-80%+). A PR that adds 100 lines with 38% coverage technically passes all gates while leaving 62% of new logic untested.
🟡 Medium Priority
5. No code formatting check (Prettier)
The repository has ESLint but no Prettier or other formatter enforcement. TypeScript/JavaScript formatting is only enforced by ESLint stylistic rules (if any), and contributors may submit PRs with inconsistent formatting that creates noisy diffs.
Impact: Minor but affects review quality and diff readability over time.
6.
build.ymlcombines lint and build — lint failure blocks build resultThe Build Verification workflow runs
npm run lintbeforenpm run build. If lint fails, the build step is never executed, making it impossible to distinguish "lint failure" from "build failure" in the status check list. Lint already has its own dedicated workflow (lint.yml).Impact: Duplicate lint runs; failure category ambiguity in PR status.
7. No dist artifact size monitoring
The
dist/directory is compiled TypeScript output. There is no check that prevents unintentional bundle size growth (e.g., accidentally bundling large dependencies, adding debug code).Impact: Could cause slow installs or unexpected behavior for users of the npm package.
8. Integration tests do not run when only
src/**changes (smoke-chroot)smoke-chroot.mdhaspaths: [src/**, containers/**, ...]path filtering — this is correct. However, the mainIntegration Tests Suite(test-integration-suite.yml) runs on all PRs regardless of what changed, including pure documentation changes. Conversely, the chroot tests (which test the--chroot-dirflag) are not triggered by changes to non-container source files unless they match the path filter.Impact: Integration test runs are sometimes redundant; some source changes may miss targeted test coverage.
9. No license compliance check
The project uses MIT license and depends on many npm packages. There is no automated check that new dependencies don't introduce incompatible licenses (e.g., GPL/AGPL that could affect distribution).
Impact: A dependency with an incompatible license could be introduced unknowingly.
10. Agentic smoke tests and build-test workflows are non-blocking
The smoke tests (Claude/Codex/Copilot) and build-test agentic workflows run on PRs but are not enforced as required status checks. They are valuable for validation but their results are informational only unless configured as required in branch protection.
Impact: A PR that breaks the Claude smoke test can still be merged if the required static checks pass.
🟢 Low Priority
11. No mutation testing
Unit tests exist for
logger.ts,squid-config.ts, andcli-workflow.ts(all at 100% coverage), but no mutation testing (e.g., Stryker) validates that the tests are actually detecting bugs rather than just executing code paths.Impact: Test quality is unknown beyond line coverage metrics.
12. No documentation link validation
There are many cross-references between
README.md,docs/, andAGENTS.md. No workflow checks that internal links are valid.Impact: Documentation rot and broken links go undetected.
13. Weekly scheduled scans are not linked back to PRs
CodeQL, Trivy container scan, and dependency audit all run on a weekly/monthly schedule. When they find new CVEs, there is no automated mechanism to open an issue or PR. The
dependency-security-monitor.mdagentic workflow exists for this, but it's separate from the scheduled static scans.Impact: CVE notifications from scheduled scans may not result in timely remediation.
📋 Actionable Recommendations
1. Add ShellCheck to CI (High Priority)
Issue: 6 critical shell scripts have no linting.
Solution: Add a
shellcheckstep tobuild.ymlor a dedicatedlint-scripts.yml:Complexity: Low | Impact: High — catches iptables setup bugs before runtime
2. Remove path filter from Container Scan (High Priority)
Issue: Container scan skips most PRs.
Solution: Remove
paths:filter fromcontainer-scan.ymlso it runs on every PR, or addsrc/**to the paths list since source changes can affect container behavior.Complexity: Low | Impact: High — ensures every PR is scanned for container CVEs
3. Raise coverage thresholds incrementally (High Priority)
Issue: Thresholds are too low (38%) for a security-critical tool.
Solution: Set a roadmap: 50% by Q2, 65% by Q3, 80% by Q4. Increase thresholds by 5% each quarter as the
test-coverage-improveragentic workflow generates new tests. Prioritizecli.tsanddocker-manager.ts.Complexity: Medium | Impact: High — forces test-writing alongside feature development
4. Add Prettier formatting check (Medium Priority)
Issue: No code formatting enforcement.
Solution: Add
.prettierrcand a format-check step:npx prettier --check "src/**/*.ts". Can be added tolint.yml.Complexity: Low | Impact: Medium — improves code consistency
5. Separate lint from build in
build.yml(Medium Priority)Issue: Lint blocks build step in Build Verification.
Solution: Remove
npm run lintfrombuild.ymlsincelint.ymlalready covers it. Reduces duplication and gives clearer failure signals.Complexity: Low | Impact: Low-Medium — cleaner status checks
6. Add dist size tracking (Medium Priority)
Issue: No artifact size monitoring.
Solution: Add a step in
build.ymlthat prints and comparesdist/size, failing if it grows by >20%:Complexity: Low | Impact: Medium — prevents accidental bundle bloat
7. Add license compliance check (Low Priority)
Issue: No license scanning.
Solution: Add
npx license-checker --onlyAllow "MIT;BSD-2-Clause;BSD-3-Clause;ISC;Apache-2.0;CC0-1.0"as a step in the dependency audit workflow.Complexity: Low | Impact: Medium — prevents license compliance issues
8. Enforce smoke/build-test as required status checks (Medium Priority)
Issue: AI-driven tests are advisory, not blocking.
Solution: Configure the smoke-claude, smoke-copilot, smoke-codex results as required checks in branch protection settings (or accept the risk explicitly by documenting the policy).
Complexity: Low (config change only) | Impact: High — ensures AWF actually works with all 3 engines before merging
9. Add documentation link validation (Low Priority)
Issue: Broken internal links go undetected.
Solution: Use
lycheeormarkdown-link-checkin a scheduled or PR workflow targeting**/*.mdfiles.Complexity: Low | Impact: Low-Medium — improves documentation quality
📈 Metrics Summary
cli.tscoveragedocker-manager.tscoverageTop 3 priorities:
Generated by CI/CD Gaps Assessment workflow — run #22831070902 on 2026-03-08
Beta Was this translation helpful? Give feedback.
All reactions