[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1198
Replies: 1 comment
-
|
🔮 The ancient spirits stir, and the smoke test agent has passed this way. The runes confirm the checks were witnessed under a steady flame.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature and comprehensive CI/CD pipeline with 57 active GitHub Actions workflows across static analysis, unit testing, integration testing, security scanning, and AI-powered agentic workflows.
Overall health snapshot (most recent run, 2026-03-09):
The 3 smoke workflows and 6+ agentic build-test workflows are all currently failing, representing significant blind spots in end-to-end validation.
✅ Existing Quality Gates
On every PR to
maindist/artifact verification, api-proxy unit teststsc --noEmitstrict mode viatsconfig.check.json~120+ test cases)/procfilesystem, Edge Casesbasic-curl.sh,using-domains-file.sh,debugging.sh,blocked-domains.shend-to-endaction.ymlsetup action with latest and specific versionsnpm audit --audit-level=highfor main and docs-site packagesScheduled / Event-Driven
🔍 Identified Gaps
🔴 High Priority
1. All agentic build-test workflows are currently failing
All 6+ build-test workflows (Bun, Deno, Go, Java, Node.js, Rust) show
failurein the latest CI run. These workflows validate that AWF works correctly as a build environment for real language ecosystems — a core product use case. Having them all fail means this validation layer is entirely absent.Impact: Real regressions to cross-language build support may be merged undetected.
2. Security Guard AI agent is failing
The Claude-based Security Guard that reviews PRs for security-impacting changes is currently failing. For a security tool like AWF, this is a critical quality gate.
Impact: Security-impacting PRs proceed without the AI-assisted security review.
3. No docs-site build verification on PRs
The
deploy-docs.ymlworkflow only triggers onpushtomain, not onpull_request. Documentation build failures (broken MDX, bad imports, broken Astro config) are only caught after merging.Impact: PRs that break the documentation site are merged silently.
4. Coverage thresholds are too low for a security-critical tool
Global thresholds are set at: Branches 30%, Functions 35%, Lines/Statements 38%. For a security firewall, these are dangerously low minimums. Critical files like
host-iptables.ts,squid-config.ts, andcontainers/agent/entrypoint.shmay have insufficient coverage without triggering any CI failure.Impact: Security-critical code paths can lack test coverage without any CI signal.
🟡 Medium Priority
5. No shellcheck for shell scripts
There are several security-critical shell scripts (
containers/agent/setup-iptables.sh,containers/agent/entrypoint.sh,containers/squid/entrypoint.sh,scripts/ci/cleanup.sh) but no automatedshellchecklinting. Shell script bugs (incorrect quoting, unsafe variable expansion, unintended glob expansion) are common security vulnerabilities.Impact: Shell script bugs in iptables setup scripts could silently misconfigure the firewall.
6. All smoke tests are failing
The three end-to-end smoke tests (Smoke Claude, Smoke Codex, Smoke Copilot) that validate the full agent-through-firewall workflow are all currently failing. These represent the highest-fidelity validation of the product.
Impact: End-to-end regressions may go undetected.
7. No per-file or per-directory coverage enforcement
Only global coverage thresholds are configured in
jest.config.js. Security-sensitive modules (e.g.,src/host-iptables.ts,src/squid-config.ts,src/domain-patterns.ts) could drop to 0% coverage without any CI failure, as long as other modules compensate.Impact: High-risk security code can lose test coverage without a warning.
8. Integration test coverage is excluded from coverage reports
npm run test:coverageruns only unit tests (fromjest.config.js). Integration tests run separately viajest.integration.config.jsand their coverage is never merged into the coverage report. The actual coverage of the system under real conditions is unknown.Impact: Coverage metrics are misleading — the real coverage including integration tests could be higher or lower; no one knows.
9. No performance regression testing
There is no measurement of container startup time, memory usage, or throughput under load. AWF's core UX depends on startup latency (time from
awfinvocation to agent command execution).Impact: Performance regressions (e.g., slow container startup) can be merged without detection.
🟢 Low Priority
10. No artifact size monitoring
The compiled
dist/output size is not tracked or enforced. Large dependency additions or accidental bundling of test files could inflate the package size.Impact: Package distribution size can grow silently.
11. No license compliance check
There is no automated check of third-party dependency licenses (e.g., FOSSA,
license-checker). New dependencies with incompatible licenses (GPL, AGPL) could be introduced.Impact: License compliance issues discovered late in release cycle.
12. No mutation testing
Jest tests are run but their effectiveness is never validated. A mutation testing tool (e.g., Stryker) could verify that failing the tests actually catches bugs. This is especially important for security invariant tests.
Impact: Tests that look comprehensive may not actually catch regressions.
📋 Actionable Recommendations
pull_requesttrigger todeploy-docs.ymlwith a build-only job (skip deploy step)src/host-iptables.tsandsrc/squid-config.tsshellcheckstep tobuild.ymlor a dedicatedlint-scripts.ymlworkflow targetingcontainers/**/*.shandscripts/**/*.shcoverageThresholdper-path entries injest.config.jsfor the 3-5 most security-critical files--coveragetotest:integrationscript and merge with unit coverage intest-coverage.ymlusing--merge-coverageflag (Jest 29+)timein the Examples Test; alert if it exceeds a thresholdbuild.ymlto checkdu -sh dist/and fail if size exceeds a thresholdnpx license-checker --failOn GPLstep todependency-audit.ymlsrc/domain-patterns.tsandsrc/squid-config.tsas a scheduled weekly job📈 Metrics Summary
src/*.test.ts)tests/integration/*.test.ts)The core static analysis and testing pipeline is healthy. The primary risks are: (a) the currently broken agentic build-test and smoke workflows that validate real-world usage, (b) low coverage thresholds for a security-critical codebase, and (c) missing shellcheck for security-sensitive shell scripts that configure iptables rules.
Beta Was this translation helpful? Give feedback.
All reactions