[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1174

2026-03-07T22:19:41Z

github-actions[bot]
bot Mar 7, 2026

📊 Current CI/CD Pipeline Status

The repository has a comprehensive and multi-layered CI/CD pipeline combining traditional GitHub Actions workflows with agentic (AI-powered) workflows. Overall the pipeline is healthy and well-structured, with a rich set of quality gates already in place.

Pipeline summary:

26 workflow files total (traditional + agentic)
14 workflows trigger on pull requests
Recent PR workflow success rate: ~95%+ (one Integration Tests failure observed in recent PRs)
All agentic workflows compile successfully

✅ Existing Quality Gates

Traditional Workflows (run on every PR)

Workflow	What it checks	File
Build Verification	TypeScript compilation, ESLint, API proxy unit tests, Node 20/22 matrix	`build.yml`
Lint	ESLint on all TS source files	`lint.yml`
TypeScript Type Check	`tsc --noEmit` strict type checking	`test-integration.yml`
PR Title Check	Conventional Commits format enforcement	`pr-title.yml`
Test Coverage	Jest unit tests with coverage, PR coverage delta comment	`test-coverage.yml`
Integration Tests	Domain/network, protocol/security, container ops, API proxy (4 parallel jobs)	`test-integration-suite.yml`
Chroot Integration Tests	Chroot mode language support (Node, Python, Go, Java, .NET)	`test-chroot.yml`
CodeQL	SAST scanning for JS/TS and GitHub Actions	`codeql.yml`
Dependency Vulnerability Audit	`npm audit` → SARIF upload to Security tab	`dependency-audit.yml`
Container Security Scan	Trivy image scan for CRITICAL/HIGH CVEs (path-filtered)	`container-scan.yml`
Examples Test	End-to-end smoke tests on all example scripts	`test-examples.yml`
Test Setup Action	Validates the `action.yml` setup action	`test-action.yml`

Agentic Workflows (run on PRs)

Workflow	What it does
Security Guard (Claude)	AI-powered security review — checks iptables, Squid config, capability changes
Smoke Claude / Codex / Copilot	End-to-end smoke tests of each AI engine through the firewall
Smoke Chroot	Smoke test of chroot mode (path-filtered)
Build Test × 8 (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust)	Tests the firewall wrapping builds for each language ecosystem

Scheduled / Maintenance

Secret Digger (hourly, all three engines) — scans for accidentally committed secrets
Dependency Security Monitor (daily) — extended dependency vulnerability monitoring
Security Review (daily) — broader security posture review
CLI Flag Consistency Checker (weekly) — checks docs match implementation
Test Coverage Improver (weekly) — opens PRs to improve test coverage
Doc Maintainer (daily) — keeps documentation up to date
CI Doctor — monitors all 27 listed workflows for health

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Unit Test Coverage Thresholds

Current coverage: Statements 38%, Branches 31%, Functions 37%, Lines 38%

The thresholds enforced in jest.config.js are so close to actual coverage that they function as a floor, not a quality gate:

coverageThreshold: { global: { branches: 30, functions: 35, lines: 38, statements: 38 } }

Key files with poor coverage:

cli.ts — 0% coverage (the main entry point)
docker-manager.ts — 18% statements, 4% functions (the largest and most critical file)
host-iptables.ts — 55% branch coverage (security-critical iptables rules)

This means large portions of critical security code are not unit-tested at all.

2. Integration Tests Have Recent Failures on PRs

The Integration Tests workflow had a failure in the most recent PR run. These tests require Docker and live network I/O, making them susceptible to flakiness (Docker network pool exhaustion, Squid startup timing). There is no automated retry or flake detection mechanism.

3. Container Security Scan is Path-Filtered — May Miss Source Code Changes

container-scan.yml only triggers when containers/** changes:

paths:
  - 'containers/**'
  - '.github/workflows/container-scan.yml'

Source code changes in src/ that affect container behavior (e.g., new capabilities, new mounts) are never scanned by Trivy on PR.

4. No Shell Script Linting (ShellCheck)

The repository contains multiple shell scripts that are security-critical:

containers/agent/setup-iptables.sh — configures NAT rules
containers/agent/entrypoint.sh — drops capabilities, runs user commands
containers/squid/entrypoint.sh — fixes permissions
scripts/ci/*.sh — cleanup and test scripts

None of these are linted by ShellCheck in CI. Shell script bugs in security-critical paths can introduce vulnerabilities.

🟡 Medium Priority

5. No Dockerfile Linting (Hadolint)

containers/agent/Dockerfile and containers/squid/Dockerfile are not linted. Hadolint would catch security anti-patterns (e.g., apt-get without --no-install-recommends, missing USER directives, ADD instead of COPY).

6. No Performance / Regression Benchmarks

There are no benchmarks measuring container startup time, proxy overhead, or iptables rule setup time. Performance regressions from refactoring are invisible.

7. Smoke Tests Require Emoji Reactions to Run Fully

Smoke Claude/Codex/Copilot run on every PR, but their full functionality (the AI engine itself) is only triggered by specific emoji reactions (❤️ heart, 🎉 hooray, 👀 eyes). Regular PRs only get a lightweight "smoke" pass, not a full agent execution validation.

8. No API Contract Tests for the API Proxy

containers/api-proxy/server.js has unit tests but no contract tests validating the HTTP request/response format against actual Copilot API endpoints. Breaking changes to the proxy protocol would only surface in integration tests.

9. Coverage Delta Enforcement is Non-Blocking

The test coverage workflow computes and comments on coverage delta (PR branch vs. base), but there is no enforcement that coverage cannot decrease. A PR that drops overall coverage passes all checks.

10. No Dead Code Detection

TypeScript unused exports, variables, and parameters are not checked. tsc --noEmit catches type errors but not unused code. Tools like ts-prune or ESLint's @typescript-eslint/no-unused-vars for exports are absent.

🟢 Low Priority

11. No License Compliance Scanning

No FOSSA, license-checker, or equivalent tool verifies that dependencies use compatible open-source licenses. This is relevant since the project is distributed as a CLI tool.

12. No dist/ Artifact Size Monitoring

The compiled dist/ output size is not tracked across PRs. Accidental inclusion of large dependencies or source maps would go undetected.

13. No Documentation Validation on PR

doc-maintainer runs daily on a schedule. There is no PR check that validates documentation links, checks for broken references in README.md, or ensures AGENTS.md stays in sync with implementation.

14. SBOM Generation Only at Release Time

Software Bill of Materials (SBOM) is generated and attested during the release workflow but not generated or diffed on PRs, making it hard to spot unexpected new dependencies entering the supply chain before merge.

15. No Spelling / Grammar Check on Docs

Documentation files (.md) are excluded from the Lint workflow via paths-ignore: ['**/*.md'], and there is no spell-checking in place for documentation quality.

📋 Actionable Recommendations

1. Raise Coverage Thresholds Incrementally (High | Low effort)

Raise thresholds in jest.config.js by 5–10 percentage points every quarter to drive coverage improvement. Set a separate hard floor on docker-manager.ts and cli.ts:

coverageThreshold: {
  global: { branches: 40, functions: 45, lines: 50, statements: 50 },
  './src/cli.ts': { statements: 50, functions: 50, lines: 50 },
  './src/docker-manager.ts': { statements: 30, functions: 20, lines: 30 },
}

Impact: Prevents the coverage floor from staying permanently low.

2. Add ShellCheck Linting (High | Low effort)

Add a new workflow or job to build.yml to run ShellCheck on all shell scripts:

- name: Lint shell scripts
  run: |
    sudo apt-get install -y shellcheck
    shellcheck containers/agent/setup-iptables.sh containers/agent/entrypoint.sh \
               containers/squid/entrypoint.sh scripts/ci/*.sh

Impact: Catches syntax errors and security anti-patterns in security-critical scripts.

3. Remove Path Filter on Container Security Scan (High | Trivial effort)

Change container-scan.yml to trigger on all PRs to main, or at minimum also include src/** and package.json in the path filter, so Trivy scans run whenever container configuration could be indirectly affected.
Impact: Prevents container CVEs introduced via source changes from slipping through.

4. Add Integration Test Retry Logic (Medium | Low effort)

Add continue-on-error: false with a retry step or use the nick-fields/retry action for flaky Docker operations. Add a short retry loop around the integration test run command to reduce flake-induced PR failures.
Impact: Reduces developer frustration from spurious CI failures.

5. Enforce Coverage Non-Regression (Medium | Low effort)

Add a step to test-coverage.yml that fails if the PR's total coverage drops below the base branch's coverage:

if [ "$PR_COVERAGE" -lt "$BASE_COVERAGE" ]; then
  echo "::error::Coverage dropped from $BASE_COVERAGE% to $PR_COVERAGE%"
  exit 1
fi

Impact: Prevents test coverage erosion over time.

6. Add Hadolint Dockerfile Linting (Medium | Low effort)

Add to container-scan.yml or a new container-lint.yml:

- uses: hadolint/hadolint-action@54c9adbab1582c2ef04b2016b760714a4bfde3cf # v3.1.0
  with:
    dockerfile: containers/agent/Dockerfile

Impact: Catches Dockerfile best-practice violations and security issues.

7. Track Artifact Size (Low | Low effort)

Add a step to build.yml that measures and reports the dist/ size, and optionally fails if it grows beyond a threshold (e.g., 2MB):

du -sh dist/ && find dist/ -name '*.js' | xargs wc -c | tail -1

Impact: Prevents accidental large dependency inclusions.

8. Add License Scanning (Low | Low effort)

Add an npx license-checker --onlyAllow "MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC" step to dependency-audit.yml.
Impact: Ensures copyleft or incompatible licenses aren't introduced transitively.

📈 Metrics Summary

Metric	Value
Total workflow files	26 (14 traditional YAML, 12 agentic Markdown + lock files)
Workflows triggering on PRs	14
Recent PR success rate	~95% (Integration Tests had 1 failure)
Unit test coverage (statements)	38.39% (threshold: 38%)
Unit test coverage (branches)	31.78% (threshold: 30%)
Critical file coverage (`cli.ts`)	0%
Critical file coverage (`docker-manager.ts`)	18% statements / 4% functions
Security scanning tools	CodeQL, Trivy, `npm audit`, Security Guard AI, Secret Digger
Agentic build-test matrix	8 language ecosystems
Missing linters	ShellCheck, Hadolint, license-checker

Summary Assessment

The AWF repository has a strong and mature CI/CD foundation — especially notable are the AI-powered Security Guard review, multi-engine smoke tests, CodeQL scanning, and Trivy container scanning. The primary risks are concentrated in two areas:

Test coverage is very low on the most critical files (cli.ts at 0%, docker-manager.ts at 18%) and the enforced thresholds are too permissive to drive improvement.
Shell script and Dockerfile linting is absent, creating a gap in automated quality gates for security-critical container code.

Addressing the High Priority items (ShellCheck, coverage thresholds, container scan path filter) would close the most significant gaps with minimal effort.

AI generated by CI/CD Pipelines and Integration Tests Gap Assessment

expires on Mar 14, 2026, 10:19 PM UTC

2026-03-08T00:58:51Z

github-actions[bot]
bot Mar 8, 2026
Author

🔮 The ancient spirits stir, and the oracle has witnessed the smoke test’s passage through the veil. A quiet mark is left here, that the seeker may know the agent was here.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1174

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1174

Uh oh!

github-actions[bot] bot Mar 7, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Traditional Workflows (run on every PR)

Agentic Workflows (run on PRs)

Scheduled / Maintenance

🔍 Identified Gaps

🔴 High Priority

1. Critically Low Unit Test Coverage Thresholds

2. Integration Tests Have Recent Failures on PRs

3. Container Security Scan is Path-Filtered — May Miss Source Code Changes

4. No Shell Script Linting (ShellCheck)

🟡 Medium Priority

5. No Dockerfile Linting (Hadolint)

6. No Performance / Regression Benchmarks

7. Smoke Tests Require Emoji Reactions to Run Fully

8. No API Contract Tests for the API Proxy

9. Coverage Delta Enforcement is Non-Blocking

10. No Dead Code Detection

🟢 Low Priority

11. No License Compliance Scanning

12. No dist/ Artifact Size Monitoring

13. No Documentation Validation on PR

14. SBOM Generation Only at Release Time

15. No Spelling / Grammar Check on Docs

📋 Actionable Recommendations

1. Raise Coverage Thresholds Incrementally (High | Low effort)

2. Add ShellCheck Linting (High | Low effort)

3. Remove Path Filter on Container Security Scan (High | Trivial effort)

4. Add Integration Test Retry Logic (Medium | Low effort)

5. Enforce Coverage Non-Regression (Medium | Low effort)

6. Add Hadolint Dockerfile Linting (Medium | Low effort)

7. Track Artifact Size (Low | Low effort)

8. Add License Scanning (Low | Low effort)

📈 Metrics Summary

Summary Assessment

Replies: 1 comment

Uh oh!

github-actions[bot] bot Mar 8, 2026 Author

github-actions[bot]
bot Mar 7, 2026

github-actions[bot]
bot Mar 8, 2026
Author