[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1198

2026-03-09T22:21:14Z

github-actions[bot]
bot Mar 9, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and comprehensive CI/CD pipeline with 57 active GitHub Actions workflows across static analysis, unit testing, integration testing, security scanning, and AI-powered agentic workflows.

Overall health snapshot (most recent run, 2026-03-09):

Workflow	Status
Build Verification (Node 20 & 22)	✅ Passing
Lint (ESLint)	✅ Passing
TypeScript Type Check	✅ Passing
Test Coverage	✅ Passing
Dependency Vulnerability Audit	✅ Passing
Container Security Scan (Trivy)	✅ Passing
CodeQL	✅ Passing
PR Title Check	✅ Passing
Test Setup Action	✅ Passing
Security Guard (Claude AI review)	❌ Failing
Build Test Bun / Deno / Go / Java / Node / Rust	❌ All Failing
Smoke Claude / Smoke Codex / Smoke Copilot	❌ All Failing

The 3 smoke workflows and 6+ agentic build-test workflows are all currently failing, representing significant blind spots in end-to-end validation.

✅ Existing Quality Gates

On every PR to `main`

Build Verification — TypeScript compilation on Node 20 and 22, ESLint, dist/ artifact verification, api-proxy unit tests
Lint — Dedicated ESLint job (separate from build)
TypeScript Type Check — tsc --noEmit strict mode via tsconfig.check.json
Integration Tests — Four parallel jobs: Domain & Network, Protocol & Security, Container & Ops, API Proxy (Jest, ~120+ test cases)
Chroot Integration Tests — Four parallel jobs: Languages, Package Managers, /proc filesystem, Edge Cases
Examples Test — Runs basic-curl.sh, using-domains-file.sh, debugging.sh, blocked-domains.sh end-to-end
Test Setup Action — Validates the action.yml setup action with latest and specific versions
Test Coverage — Jest unit coverage with PR delta comparison comment; fails on regression
CodeQL — Static analysis for JS/TS and Actions workflow files
Container Security Scan — Trivy scans agent and squid containers for CRITICAL/HIGH CVEs
Dependency Vulnerability Audit — npm audit --audit-level=high for main and docs-site packages
PR Title Check — Conventional Commits format enforcement with allowed scopes
Security Guard (Claude) — AI-powered security review identifying changes that weaken firewall posture
Agentic Build Tests — 8 language-specific workflows (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust) that actually build real projects through the firewall

Scheduled / Event-Driven

Secret Digger (Claude, Codex, Copilot) — runs hourly to detect committed secrets
Daily Security Review and Threat Modeling
Weekly Dependency Security Monitor and CLI Flag Consistency Checker
Weekly Test Coverage Improver (opens PRs to add missing tests)

🔍 Identified Gaps

🔴 High Priority

1. All agentic build-test workflows are currently failing

All 6+ build-test workflows (Bun, Deno, Go, Java, Node.js, Rust) show failure in the latest CI run. These workflows validate that AWF works correctly as a build environment for real language ecosystems — a core product use case. Having them all fail means this validation layer is entirely absent.

Impact: Real regressions to cross-language build support may be merged undetected.

2. Security Guard AI agent is failing

The Claude-based Security Guard that reviews PRs for security-impacting changes is currently failing. For a security tool like AWF, this is a critical quality gate.

Impact: Security-impacting PRs proceed without the AI-assisted security review.

3. No docs-site build verification on PRs

The deploy-docs.yml workflow only triggers on push to main, not on pull_request. Documentation build failures (broken MDX, bad imports, broken Astro config) are only caught after merging.

Impact: PRs that break the documentation site are merged silently.

4. Coverage thresholds are too low for a security-critical tool

Global thresholds are set at: Branches 30%, Functions 35%, Lines/Statements 38%. For a security firewall, these are dangerously low minimums. Critical files like host-iptables.ts, squid-config.ts, and containers/agent/entrypoint.sh may have insufficient coverage without triggering any CI failure.

Impact: Security-critical code paths can lack test coverage without any CI signal.

🟡 Medium Priority

5. No shellcheck for shell scripts

There are several security-critical shell scripts (containers/agent/setup-iptables.sh, containers/agent/entrypoint.sh, containers/squid/entrypoint.sh, scripts/ci/cleanup.sh) but no automated shellcheck linting. Shell script bugs (incorrect quoting, unsafe variable expansion, unintended glob expansion) are common security vulnerabilities.

Impact: Shell script bugs in iptables setup scripts could silently misconfigure the firewall.

6. All smoke tests are failing

The three end-to-end smoke tests (Smoke Claude, Smoke Codex, Smoke Copilot) that validate the full agent-through-firewall workflow are all currently failing. These represent the highest-fidelity validation of the product.

Impact: End-to-end regressions may go undetected.

7. No per-file or per-directory coverage enforcement

Only global coverage thresholds are configured in jest.config.js. Security-sensitive modules (e.g., src/host-iptables.ts, src/squid-config.ts, src/domain-patterns.ts) could drop to 0% coverage without any CI failure, as long as other modules compensate.

Impact: High-risk security code can lose test coverage without a warning.

8. Integration test coverage is excluded from coverage reports

npm run test:coverage runs only unit tests (from jest.config.js). Integration tests run separately via jest.integration.config.js and their coverage is never merged into the coverage report. The actual coverage of the system under real conditions is unknown.

Impact: Coverage metrics are misleading — the real coverage including integration tests could be higher or lower; no one knows.

9. No performance regression testing

There is no measurement of container startup time, memory usage, or throughput under load. AWF's core UX depends on startup latency (time from awf invocation to agent command execution).

Impact: Performance regressions (e.g., slow container startup) can be merged without detection.

🟢 Low Priority

10. No artifact size monitoring

The compiled dist/ output size is not tracked or enforced. Large dependency additions or accidental bundling of test files could inflate the package size.

Impact: Package distribution size can grow silently.

11. No license compliance check

There is no automated check of third-party dependency licenses (e.g., FOSSA, license-checker). New dependencies with incompatible licenses (GPL, AGPL) could be introduced.

Impact: License compliance issues discovered late in release cycle.

12. No mutation testing

Jest tests are run but their effectiveness is never validated. A mutation testing tool (e.g., Stryker) could verify that failing the tests actually catches bugs. This is especially important for security invariant tests.

Impact: Tests that look comprehensive may not actually catch regressions.

📋 Actionable Recommendations

#	Gap	Recommendation	Complexity	Impact
1	Build-test workflows failing	Investigate and fix root cause (likely token or network issue); add a health check job that alerts	Low	High
2	Security Guard failing	Debug the Claude agent failure; check if it's a token, MCP, or prompt issue	Low	High
3	No docs build on PRs	Add `pull_request` trigger to `deploy-docs.yml` with a build-only job (skip deploy step)	Low	High
4	Low coverage thresholds	Incrementally raise thresholds to 60%+ over the next few releases; add per-file thresholds for `src/host-iptables.ts` and `src/squid-config.ts`	Medium	High
5	No shellcheck	Add a `shellcheck` step to `build.yml` or a dedicated `lint-scripts.yml` workflow targeting `containers/*/.sh` and `scripts/*/.sh`	Low	Medium
6	Smoke tests failing	Investigate failures; ensure required secrets are provisioned; add a fallback mock-mode smoke test that doesn't require live AI tokens	Medium	High
7	No per-file coverage	Add `coverageThreshold` per-path entries in `jest.config.js` for the 3-5 most security-critical files	Low	Medium
8	Integration coverage excluded	Add `--coverage` to `test:integration` script and merge with unit coverage in `test-coverage.yml` using `--merge-coverage` flag (Jest 29+)	Medium	Medium
9	No performance testing	Add a lightweight startup benchmark that measures time-to-first-command using `time` in the Examples Test; alert if it exceeds a threshold	Low	Medium
10	No artifact size check	Add a step in `build.yml` to check `du -sh dist/` and fail if size exceeds a threshold	Low	Low
11	No license check	Add `npx license-checker --failOn GPL` step to `dependency-audit.yml`	Low	Low
12	No mutation testing	Add Stryker mutation testing for `src/domain-patterns.ts` and `src/squid-config.ts` as a scheduled weekly job	High	Medium

📈 Metrics Summary

Metric	Value
Total GitHub Actions workflows	57
Workflows triggered on PRs	~16
Core quality workflows (build/lint/test/security)	13
Agentic AI workflows	~27
Unit test files	9 (`src/*.test.ts`)
Integration test files	26 (`tests/integration/*.test.ts`)
Current unit coverage thresholds	Branches 30%, Functions 35%, Lines/Statements 38%
Workflows currently failing	9+ (build-test × 6, smoke × 3, security-guard × 1)
Recent core workflow pass rate	~100% (Build, Lint, Type Check, Coverage, CodeQL, etc.)

The core static analysis and testing pipeline is healthy. The primary risks are: (a) the currently broken agentic build-test and smoke workflows that validate real-world usage, (b) low coverage thresholds for a security-critical codebase, and (c) missing shellcheck for security-sensitive shell scripts that configure iptables rules.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 16, 2026, 10:21 PM UTC

2026-03-10T00:54:51Z

github-actions[bot]
bot Mar 10, 2026
Author

🔮 The ancient spirits stir, and the smoke test agent has passed this way. The runes confirm the checks were witnessed under a steady flame.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1198

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1198

Uh oh!

github-actions[bot] bot Mar 9, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

On every PR to main

Scheduled / Event-Driven

🔍 Identified Gaps

🔴 High Priority

1. All agentic build-test workflows are currently failing

2. Security Guard AI agent is failing

3. No docs-site build verification on PRs

4. Coverage thresholds are too low for a security-critical tool

🟡 Medium Priority

5. No shellcheck for shell scripts

6. All smoke tests are failing

7. No per-file or per-directory coverage enforcement

8. Integration test coverage is excluded from coverage reports

9. No performance regression testing

🟢 Low Priority

10. No artifact size monitoring

11. No license compliance check

12. No mutation testing

📋 Actionable Recommendations

📈 Metrics Summary

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 10, 2026 Author

github-actions[bot]
bot Mar 9, 2026

On every PR to `main`

github-actions[bot]
bot Mar 10, 2026
Author