[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1145
Replies: 1 comment
-
|
🔮 The ancient spirits stir; the smoke test agent has passed through, leaving a quiet omen in its wake.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature and layered CI/CD setup with 15+ standard GitHub Actions workflows and 28+ agentic (AI-driven) workflows. Most standard checks run on every PR to
mainand PRs appear healthy overall — the most recent PR (feat/token-rate-limiting) had 22 checks run with only 1 integration test failure.Standard Workflows on PRs:
build.ymllint.ymltest-integration.ymltest-coverage.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymlcodeql.ymlcontainer-scan.ymldependency-audit.ymlpr-title.ymlAgentic Workflows on PRs:
Continuous Monitoring (daily/hourly):
✅ Existing Quality Gates
src/on every PRtsconfig.check.jsonawfinvocations as smoke scriptsnpm audit --audit-level=highon main + docs packages🔍 Identified Gaps
🔴 High Priority
1. Test coverage thresholds are barely-passing minimums
The current coverage thresholds mirror the existing (low) coverage rather than enforcing a quality bar:
The most critical file,
docker-manager.ts(container lifecycle management), likely has coverage well below 50%. Low-threshold gates give false confidence — a PR could delete half the tests and still pass.2. No shellcheck/static analysis for shell scripts
Six shell scripts are in
containers/agent/(entrypoint.sh,setup-iptables.sh,docker-stub.sh,get-claude-key.sh,pid-logger.sh,api-proxy-health-check.sh) plusscripts/ci/cleanup.sh. These scripts are security-critical (iptables rules, capability drops, credential handling) but receive zero automated static analysis. A shell quoting bug or logic error could silently weaken the firewall.3. Container security scan misses source-only changes
container-scan.ymlhas a path filter:paths: containers/**. A TypeScript change insrc/docker-manager.tsthat alters how containers are launched (e.g., changing security options, capability sets, or network config) will not trigger a container rescan. The scan should also run whensrc/**changes, as the generated Docker Compose config determines runtime security posture.4. Misleading workflow filename causes developer confusion
test-integration.ymlactually contains the TypeScript Type Check workflow (not integration tests). The actual integration tests are intest-integration-suite.yml. This naming inversion will confuse contributors trying to understand CI status or reproduce failures locally.🟡 Medium Priority
5. No PR-blocking secret scanning gate
The
secret-digger-*workflows run every hour but are not blocking PR checks — a secret committed in a PR could be merged in the ~60-minute window between scans. A dedicatedgitleaksortrufflehogcheck running as a required PR status check would close this window.6. No Docker image size monitoring
There is no check to detect unexpected growth in container image sizes. A PR adding a large debugging tool or accidentally installing unnecessary packages could significantly increase image sizes without any CI signal.
7. Integration tests not confirmed as required status checks
The recent
feat/token-rate-limitingPR had an Integration Tests failure yet the PR appears to have proceeded. If integration tests are not configured as required branch protection status checks in repository settings, they provide warnings but not true gates.8. No performance/timing regression detection
Container startup time and proxy latency are user-facing metrics for this project. There are no benchmarks or timing assertions — a change that doubles startup time would pass all tests.
9. Missing coverage for
docker-manager.tscritical pathsThe unit test file
docker-manager.test.tsexists, but given the overall 38% coverage, the complex container lifecycle methods (startContainers,runAgentCommand,performCleanup) are likely largely untested at the unit level. The integration tests cover these indirectly, but unit-level coverage of config generation logic would catch bugs before they reach containers.10.
api-proxyunit tests run in build (not path-filtered)containers/api-proxy/npm testruns inbuild.ymlon every PR regardless of whethercontainers/api-proxy/was modified, unnecessarily increasing build time for unrelated changes.🟢 Low Priority
11. No broken link checker for documentation
The docs site (
docs-site/) and Markdown files indocs/are not checked for broken links. As the project grows and URLs change (especially internal cross-references), stale links accumulate silently.12. No CHANGELOG/release notes enforcement
PRs don't require a
CHANGELOGentry ordocs/update. This creates gaps in release history and makes it harder to understand what changed between versions.13. No artifact size tracking for
dist/The compiled output in
dist/is not size-tracked. A large regression in bundle size (e.g., from an accidental large dependency) would not be caught.14. No concurrent cleanup race condition test
The cleanup lifecycle is documented as security-critical (prevents Docker network pool exhaustion), but there is no CI test verifying cleanup handles concurrent awf invocations or SIGKILL correctly.
📋 Actionable Recommendations
1. Raise test coverage thresholds incrementally 🔴 — Medium complexity, High impact
Set a realistic target (e.g., 60% statements, 50% branches) and increase thresholds by 2-3% per sprint. Update
jest.config.jscoverageThresholdto reflect the actual target:The existing
test-coverage-improveragentic workflow (weekly) can drive incremental increases.2. Add shellcheck to CI 🔴 — Low complexity, High impact
Add a new workflow step or job to
build.yml:This catches quoting issues, command injection risks, and logic errors in
setup-iptables.shandentrypoint.sh— the most security-sensitive code in the repository.3. Remove path filter from container scan 🔴 — Low complexity, High impact
In
container-scan.yml, addsrc/**andpackage.jsonto thepathsfilter so that source changes that affect container configuration also trigger a rescan:4. Rename misleading workflow file 🔴 — Low complexity, Low impact
Rename
test-integration.yml→type-check.ymlto reflect its actual content ("TypeScript Type Check"). Update CI Doctor's workflow list accordingly.5. Add gitleaks as a blocking PR check 🟡 — Low complexity, High impact
Add a dedicated secret scanning step using
gitleaks/gitleaks-actionas a required status check, separate from the hourly agentic scans:6. Verify and enforce required status checks 🟡 — Low complexity, High impact
In repository Settings → Branches → Branch protection rules for
main, ensure these workflows are required status checks: Build Verification, Integration Tests, Chroot Integration Tests, TypeScript Type Check, Test Coverage, Lint, CodeQL, PR Title Check. If Integration Tests is not required, the recent failure onfeat/token-rate-limitingwould not have blocked merging.7. Add Docker image size monitoring 🟡 — Medium complexity, Medium impact
Add a step to
build.ymlorcontainer-scan.ymlthat records image sizes and fails if they exceed a threshold:8. Add path filter to api-proxy tests in build.yml 🟢 — Low complexity, Low impact
Move the
api-proxyunit test step to a separate job withpaths: ['containers/api-proxy/**']to avoid running it on unrelated changes.9. Add documentation link checker 🟢 — Low complexity, Low impact
Add
lychee-actionormarkdown-link-checkto the docs deploy workflow to verify links before deployment.10. Add performance baseline tracking 🟡 — High complexity, Medium impact
Add a benchmark job using
hyperfinethat measuresawfstartup-to-first-output time and compares against a baseline stored as a repository artifact. Flag regressions >10% in the PR comment.📈 Metrics Summary
src/Key Finding
The pipeline is comprehensive in breadth — covering security, builds, integration, docs, and multiple AI engines — but has meaningful gaps in depth: test coverage is low, shell scripts are unguarded, and some critical checks (container scan, integration tests as required checks) have configuration gaps. The most impactful quick wins are shellcheck for shell scripts and raising coverage thresholds incrementally.
Beta Was this translation helpful? Give feedback.
All reactions