This document provides comprehensive guidelines for AI coding agents working on the ADT Studio codebase. It enforces architectural consistency, security best practices, and frontend development standards.
- Core Principles
- Architecture Overview
- Code Organization
- Security Requirements
- Frontend Development
- Backend Development
- Type Safety & Validation
- Testing Requirements
- Common Patterns
- Anti-Patterns to Avoid
- Checklist Before Submitting
These principles are non-negotiable and must guide every decision:
-
Book Level Storage: All book data must be isolated to a single directory that can be zipped and shared. Never store book-specific data outside the book's directory.
-
Entity Level Versioning: NEVER overwrite entities. Always create new versions with incremented version numbers. Users must be able to roll back.
-
LLM Level Caching: Cache at the LLM call level only. Hash all ordered inputs to create cache keys. Pipeline reruns should be fast if parameters unchanged.
-
Maximum Transparency: All LLM calls, prompts, and responses must be inspectable by users. No black boxes.
-
Minimize Dependencies: If you can avoid adding a new dependency, do so. Flat files > database when sufficient. In-memory queues > external queue services.
-
Pure JS/TS Over Native: Always prefer pure JavaScript/TypeScript or WASM-based libraries over native C/C++ bindings. Native bindings break cross-platform builds, complicate CI, and conflict with desktop packaging. If a native binding is the only option, document why.
adt/
├── packages/ # Shared libraries (MUST be reused)
│ ├── types/ # Zod schemas - ALL types defined here
│ ├── pipeline/ # Extraction & generation - pure functions
│ ├── llm/ # LLM client, prompts, caching, cost tracking
│ ├── pdf/ # PDF extraction only
│ └── output/ # Bundle packaging only
│
├── apps/ # Application tier
│ ├── api/ # Hono HTTP server
│ ├── studio/ # React SPA (Vite)
│ └── desktop/ # Tauri v2 desktop wrapper (sidecar architecture)
│
├── templates/ # Layout templates
├── config/ # Global configuration
└── docs/ # Architecture documentation
┌─────────────────────────────────────────────────────────┐
│ apps/studio (React) │
│ apps/desktop (TBD) │
└─────────────────────────┬───────────────────────────────┘
│ HTTP only
▼
┌─────────────────────────────────────────────────────────┐
│ apps/api (Hono) │
└─────────────────────────┬───────────────────────────────┘
│ Direct imports
▼
┌─────────────────────────────────────────────────────────┐
│ packages/pipeline │ packages/llm │ packages/output │
└─────────────────────────┬───────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────┐
│ packages/types │ packages/pdf │
└─────────────────────────────────────────────────────────┘
RULE: Frontend apps MUST NOT import directly from packages. All data flows through the API.
Exception: @adt/types may be imported by the studio app for the shared PIPELINE definition and derived constants (stage/step names, ordering). No business logic — only type-level and constant data.
| Type of Code | Location | Notes |
|---|---|---|
| Zod schemas, TypeScript interfaces | packages/types/src/ |
Export from index.ts |
| LLM prompts, calls, caching | packages/llm/src/ |
Use existing client |
| PDF extraction logic | packages/pdf/src/ |
Pure functions |
| Pipeline step implementations | packages/pipeline/src/ |
Pure functions, one file per step |
| Pipeline definition (stages/steps/DAG) | packages/types/src/pipeline.ts |
Single source of truth |
| Bundle/export logic | packages/output/src/ |
Archive creation |
| API endpoints | apps/api/src/index.ts |
Hono routes |
| React components | apps/studio/src/components/ |
Reuse existing |
| React pages | apps/studio/src/pages/ |
One per route |
| API client methods | apps/studio/src/api/client.ts |
Single file |
| Utility functions | Within relevant package | Not a utils folder |
kebab-case.ts # All source files
kebab-case.test.ts # Test files (co-located)
ComponentName.tsx # React components (PascalCase)
// 1. Node built-ins
import { readFile } from "fs/promises"
import path from "path"
// 2. External dependencies
import { z } from "zod"
import { Hono } from "hono"
// 3. Internal packages (workspace)
import { PipelineConfig } from "@adt/types"
import { createLLMClient } from "@adt/llm"
// 4. Relative imports (current package)
import { localHelper } from "./helpers.js"NEVER:
- Log API keys to console or files
- Include API keys in error messages
- Store API keys in git, localStorage on web without encryption consideration
- Send API keys in URL parameters
- Expose API keys in client-side bundle
ALWAYS:
// Correct: Header-based authentication
const key = c.req.header("X-OpenAI-Key")
// Correct: Environment variable (desktop sidecar)
const key = process.env["OPENAI_API_KEY"]
// Correct: Validate before use
function requireOpenAIKey(c: Context): string {
const key = getOpenAIKey(c)
if (!key) {
throw new HTTPException(401, {
message: "OpenAI API key required. Set it in Settings."
})
}
return key
}ALL user input MUST be validated with Zod:
// CORRECT: Validate with Zod schema
const CreateJobSchema = z.object({
name: z.string().min(1).max(255),
pdfPath: z.string(),
config: PipelineConfig.optional()
})
app.post("/jobs", async (c) => {
const body = await c.req.json()
const result = CreateJobSchema.safeParse(body)
if (!result.success) {
throw new HTTPException(400, {
message: `Validation error: ${result.error.message}`
})
}
// Use result.data - guaranteed to be valid
const job = await createJob(result.data)
return c.json(job)
})// NEVER: Direct path concatenation
const filePath = `${baseDir}/${userInput}` // VULNERABLE
// ALWAYS: Validate and normalize paths
import path from "path"
function getSafePath(baseDir: string, userPath: string): string {
const normalized = path.normalize(userPath)
const resolved = path.resolve(baseDir, normalized)
// Ensure resolved path is within baseDir
if (!resolved.startsWith(path.resolve(baseDir))) {
throw new Error("Path traversal attempt detected")
}
return resolved
}// NEVER: String concatenation in SQL
db.prepare(`SELECT * FROM entities WHERE id = '${id}'`) // VULNERABLE
// ALWAYS: Parameterized queries
db.prepare("SELECT * FROM entities WHERE id = ?").get(id)// NEVER: Render raw HTML from user input
<div dangerouslySetInnerHTML={{ __html: userContent }} /> // VULNERABLE
// ALWAYS: Sanitize if HTML rendering is required
import DOMPurify from "dompurify"
<div dangerouslySetInnerHTML={{ __html: DOMPurify.sanitize(userContent) }} />
// PREFER: Text content (React auto-escapes)
<div>{userContent}</div> // Safe by default// Only for development or controlled environments
app.use("*", cors({
origin: ["http://localhost:5173"], // Explicit origins
credentials: true
}))
// NEVER: Allow all origins in production
app.use("*", cors({ origin: "*" })) // DANGEROUS// Standard component template using TanStack
import { useSuspenseQuery, useMutation, useQueryClient } from "@tanstack/react-query"
import { useNavigate, useParams, Link } from "@tanstack/react-router"
import { api } from "../api/client"
import type { Job } from "@adt/types"
interface ComponentNameProps {
onSave?: (job: Job) => void
}
export default function ComponentName({ onSave }: ComponentNameProps) {
// 1. Router hooks
const navigate = useNavigate()
const { id } = useParams({ strict: false })
const queryClient = useQueryClient()
// 2. Data fetching via TanStack Query
const { data } = useSuspenseQuery({
queryKey: ["job", id],
queryFn: () => api.getJob(id!),
enabled: !!id,
})
// 3. Mutations
const updateMutation = useMutation({
mutationFn: (job: Job) => api.updateJob(job.id, job),
onSuccess: (_, job) => {
queryClient.invalidateQueries({ queryKey: ["job", job.id] })
onSave?.(job)
},
})
// 4. Event handlers
const handleSubmit = (e: React.FormEvent) => {
e.preventDefault()
if (!data) return
updateMutation.mutate(data)
}
// 5. Main render (loading/error handled by TanStack Query + ErrorBoundary)
return (
<form onSubmit={handleSubmit} className="space-y-4">
{updateMutation.error && (
<div className="p-4 text-red-700 bg-red-100 rounded-lg">
{updateMutation.error.message}
</div>
)}
{/* Component JSX */}
</form>
)
}DO:
- Use TanStack Query for all server state (fetching, caching, mutations)
- Use local
useStatefor UI-only state (modals, form inputs, toggles) - Use TanStack Query's
refetchIntervalfor real-time polling - Use TanStack Query's optimistic updates via
onMutate
DON'T:
- Add Redux, Zustand, or other state management libraries
- Create global state stores
- Use raw
useEffectfor data fetching — use TanStack Query instead - Use
fetch()directly — go through the API client + Query
// CORRECT: Polling with TanStack Query
const { data: jobs } = useQuery({
queryKey: ["jobs"],
queryFn: () => api.getJobs(),
refetchInterval: 5000, // Auto-poll every 5 seconds
})
// CORRECT: Optimistic update with TanStack Query
const queryClient = useQueryClient()
const deleteMutation = useMutation({
mutationFn: (id: string) => api.deleteJob(id),
onMutate: async (id) => {
await queryClient.cancelQueries({ queryKey: ["jobs"] })
const previous = queryClient.getQueryData<Job[]>(["jobs"])
queryClient.setQueryData<Job[]>(["jobs"], (old) =>
old?.filter((j) => j.id !== id)
)
return { previous }
},
onError: (_err, _id, context) => {
queryClient.setQueryData(["jobs"], context?.previous) // Rollback
},
onSettled: () => {
queryClient.invalidateQueries({ queryKey: ["jobs"] })
},
})All screens must follow these layout principles:
Grid symmetry: When using multi-column grids, cards in the same row MUST stretch to equal heights (items-stretch, the flexbox/grid default). Never leave one card short beside a tall one.
No orphan whitespace: Every region of the viewport should be intentionally used. If a card has less content than its neighbor, either:
- Merge the smaller content into the larger card as a section
- Use a single-column layout instead
- Redistribute content so columns are roughly balanced
Consistent spacing: Use one spacing scale throughout a page — don't mix gap-4 and gap-6 on the same level. Standard gaps: gap-4 between cards, gap-6 for page-level sections, p-4 inside cards, p-6 for page padding.
Full-width by default: Page content should use the full available width. Only constrain width (max-w-*) for text-heavy forms or reading content. Dashboard-style pages, detail pages with data panels, and grids should go edge-to-edge.
Balanced columns: In a 2-column layout, prefer grid-cols-2 (50/50) unless content clearly demands asymmetry. In a 3-column layout, use grid-cols-3 (33/33/33). Avoid odd splits like 1/3 + 2/3 unless one column is a sidebar.
Card consistency: Cards at the same hierarchy level should use the same padding, border radius, and header style. Don't mix CardHeader sizes or omit borders on some cards.
No scrolling when content fits: If content can fit on screen by using available width, lay it out that way instead of stacking vertically and scrolling. Horizontal space is cheaper than vertical scroll.
Tailwind's JIT compiler scans source files for complete class name strings at build time. Dynamic class generation will silently fail — the classes won't be included in the CSS output.
// WRONG: Dynamic template literals — Tailwind JIT can't detect these
const cls = `bg-${color}-600` // Not scanned
const hover = `hover:${bgClass}` // Not scanned
const group = `group-hover/rail:${cls}` // Not scanned
// CORRECT: Complete literal strings
const cls = "bg-blue-600" // Scanned
const hover = HOVER_MAP["bg-blue-600"] // Value is literal in the map
const group = cn("group-hover/rail:inline", flag && "inline") // Both literalsFor dynamic stage-colored hover states, use either:
- Static lookup map:
{ "bg-blue-600": "hover:bg-blue-600" }— each value is a complete literal - CSS custom properties:
style={{ '--clr': hex }}+className="text-[var(--clr)] hover:bg-[var(--clr)]"— arbitrary value syntax with static class names
ALWAYS use Tailwind utility classes:
// CORRECT: Tailwind utilities
<div className="flex items-center justify-between p-4 bg-white rounded-lg shadow">
<h2 className="text-lg font-semibold text-gray-900">Title</h2>
<button className="px-4 py-2 text-white bg-blue-600 rounded hover:bg-blue-700">
Action
</button>
</div>
// CORRECT: Conditional classes with clsx
import clsx from "clsx"
<div className={clsx(
"p-4 rounded-lg",
isActive && "bg-blue-100 border-blue-500",
isError && "bg-red-100 border-red-500",
!isActive && !isError && "bg-gray-100"
)}>NEVER:
- Create CSS modules
- Use styled-components or CSS-in-JS
- Add inline styles (except for dynamic values)
- Create custom CSS files
All API calls go through apps/studio/src/api/client.ts:
// CORRECT: Use the api client
import { api } from "../api/client"
const jobs = await api.getJobs()
const job = await api.createJob({ name, pdfPath, config })
await api.deleteJob(id)
// Adding a new endpoint? Add it to client.ts:
export const api = {
// ... existing methods
newEndpoint: async (data: NewType): Promise<ResponseType> => {
return request<ResponseType>("/new-endpoint", {
method: "POST",
body: JSON.stringify(data)
})
}
}NEVER:
- Call fetch() directly in components
- Create separate API modules per feature
- Duplicate request logic
Before creating a new component:
- Check if a similar component exists in
apps/studio/src/components/ - Check if the component can be composed from existing components
- If creating new, ensure it's generic enough for reuse
Existing components to reuse:
Layout.tsx- Main app layout with navigationSettingsModal.tsx- Modal for settings/configuration
// PREFER: Composition over new components
<div className="card"> {/* Use utility classes, not new component */}
<CardHeader />
<CardBody />
</div>
// AVOID: Creating near-duplicate components
// Bad: JobCard.tsx, BookCard.tsx, TemplateCard.tsx (with 90% same code)
// Good: Card.tsx with props for customization// CORRECT: Consistent error handling pattern
const [error, setError] = useState<string | null>(null)
const handleAction = async () => {
try {
setError(null)
await api.action()
} catch (err) {
const message = err instanceof Error ? err.message : "An error occurred"
setError(message)
// Log for debugging but don't expose internals to user
console.error("Action failed:", err)
}
}
// Display errors consistently
{error && (
<div className="p-4 text-red-700 bg-red-100 rounded-lg">
{error}
</div>
)}// CORRECT: TanStack Router navigation (type-safe)
import { useNavigate, Link } from "@tanstack/react-router"
function Component() {
const navigate = useNavigate()
// Programmatic navigation (type-safe)
const handleClick = () => {
navigate({ to: "/jobs/$id", params: { id } })
}
// Declarative navigation (type-safe)
return <Link to="/jobs/$id" params={{ id }}>View Job</Link>
}// CORRECT: TanStack Form with Zod validation
import { useForm } from "@tanstack/react-form"
import { zodValidator } from "@tanstack/zod-form-adapter"
import { CreateJobSchema } from "@adt/types"
function CreateJobForm() {
const form = useForm({
defaultValues: { name: "", pdfPath: "" },
validatorAdapter: zodValidator(),
validators: { onChange: CreateJobSchema },
onSubmit: async ({ value }) => {
await api.createJob(value)
},
})
return (
<form onSubmit={(e) => { e.preventDefault(); form.handleSubmit() }}>
<form.Field name="name" children={(field) => (
<input
value={field.state.value}
onChange={(e) => field.handleChange(e.target.value)}
className="border rounded px-3 py-2"
/>
)} />
</form>
)
}// CORRECT: TanStack Table — headless, bring your own UI
import { useReactTable, getCoreRowModel, flexRender } from "@tanstack/react-table"
const table = useReactTable({
data: jobs,
columns,
getCoreRowModel: getCoreRowModel(),
getSortedRowModel: getSortedRowModel(),
getFilteredRowModel: getFilteredRowModel(),
getPaginationRowModel: getPaginationRowModel(),
})// Standard endpoint pattern
app.post("/resource", async (c) => {
// 1. Authentication
const apiKey = requireOpenAIKey(c)
// 2. Input validation
const body = await c.req.json()
const result = RequestSchema.safeParse(body)
if (!result.success) {
throw new HTTPException(400, {
message: `Validation error: ${result.error.message}`
})
}
// 3. Business logic (delegate to service/package)
const resource = await createResource(result.data, apiKey)
// 4. Response
return c.json(resource, 201)
})// Use HTTPException for API errors
import { HTTPException } from "hono/http-exception"
// 400 - Bad Request (validation errors)
throw new HTTPException(400, { message: "Invalid input" })
// 401 - Unauthorized
throw new HTTPException(401, { message: "API key required" })
// 404 - Not Found
throw new HTTPException(404, { message: "Job not found" })
// 500 - Internal Error (let unexpected errors propagate)
// Don't catch and re-throw as 500 unless adding context// ALWAYS use the storage module with locking
import { withLock, loadJobs, saveJobs } from "./storage"
// CORRECT: Atomic read-modify-write
await withLock(async () => {
const jobs = await loadJobs()
jobs.push(newJob)
await saveJobs(jobs)
})
// NEVER: Read and write without lock
const jobs = await loadJobs() // Another process could modify here
jobs.push(newJob)
await saveJobs(jobs) // Could overwrite other changes// Use node-sqlite3-wasm — pure WASM, no native bindings
import { DatabaseSync } from "node-sqlite3-wasm"
const db = new DatabaseSync(dbPath)
// CORRECT: Parameterized query
const stmt = db.prepare(`
SELECT * FROM versions
WHERE resource_type = ? AND resource_id = ?
ORDER BY created_at DESC
`)
const versions = stmt.all(resourceType, resourceId)
// CORRECT: Transactions for multiple operations
db.exec("BEGIN")
try {
const insert = db.prepare("INSERT INTO items (id, data) VALUES (?, ?)")
for (const item of items) {
insert.run(item.id, JSON.stringify(item.data))
}
db.exec("COMMIT")
} catch (err) {
db.exec("ROLLBACK")
throw err
}
// IMPORTANT: Always close the database when done to prevent memory leaks
db.close()The pipeline is organized as a two-level DAG defined in packages/types/src/pipeline.ts:
- Stages — High-level groupings visible in the UI (Extract, Storyboard, Quizzes, Captions, Glossary, Text & Speech, Package). Stages have inter-stage dependencies.
- Steps — Atomic processing operations within a stage (e.g.,
image-filtering,page-sectioning). Steps have intra-stage dependencies and can run in parallel when dependencies are met.
The PIPELINE constant is the single source of truth. All ordering, groupings, labels, and dependency graphs are derived from it. Never hardcode step/stage ordering elsewhere.
// Derived lookups available from @adt/types:
import { PIPELINE, STAGE_ORDER, STEP_TO_STAGE, STAGE_BY_NAME, ALL_STEP_NAMES } from "@adt/types"
import type { StepName, StageName } from "@adt/types"Key files:
packages/types/src/pipeline.ts— Pipeline definition and derived lookupspackages/pipeline/src/dag.ts— Generic DAG runnerpackages/pipeline/src/pipeline-dag.ts— Pipeline-specific DAG executorapps/api/src/services/step-runner.ts— API-side stage runnersapps/studio/src/components/pipeline/StageRunCard.tsx— UI card (sub-steps derived from PIPELINE)apps/studio/src/components/pipeline/stages/— Per-stage view components
Stage runs are queued per-book — if a run is already active, new runs wait and execute sequentially. Key patterns:
- Backend:
stage-service.tsmanages aBookRunStateper book with an active job and a queue. Jobs drain automatically on completion/failure. - Frontend: All run handlers call
queueRun(options)fromuseBookRun()— neverapi.runStagesdirectly. This function does an optimistic cache update (mark stage "queued", clear downstream), then chains the API call through a promise chain to preserve click ordering. - Data clearing: Happens via a
beforeRuncallback when the job starts executing, not when enqueued. This prevents clearing data for a stage that hasn't started yet. - SSE continuity: The SSE stream is always-on (opens on book mount, closes on unmount). A
queue-nextevent signals when a queued run begins executing, triggering a full refetch. - Query invalidation: Use
invalidateQueries— neverremoveQueries.removeQueriesdeletes cached data, causing completed stages to flash to "unrun" while the refetch is in flight.invalidateQuerieskeeps stale data visible during the refetch, preventing visual glitches.
// CORRECT: Use queueRun from context (handles optimistic update + API call + invalidation)
const { queueRun } = useBookRun()
queueRun({ fromStage: "storyboard", toStage: "storyboard", apiKey })
// WRONG: Calling API directly from a handler
await api.runStages(label, apiKey, { fromStage, toStage })// Pipeline functions MUST be pure
// - No side effects
// - Same input = same output
// - All dependencies passed as parameters
// CORRECT: Pure pipeline function
export async function classifyText(
text: string,
options: ClassifyOptions,
llmClient: LLMClient
): Promise<Classification> {
const prompt = buildClassificationPrompt(text, options)
const result = await llmClient.complete(prompt)
return parseClassification(result)
}
// WRONG: Side effects, hidden dependencies
export async function classifyText(text: string) {
const options = globalConfig.classification // Hidden dependency
console.log("Classifying:", text) // Side effect
const result = await globalLLMClient.complete(...) // Hidden dependency
saveToCache(result) // Side effect
return result
}ALL data structures MUST have Zod schemas in packages/types:
// packages/types/src/job.ts
import { z } from "zod"
export const JobStatus = z.enum(["pending", "processing", "completed", "failed"])
export type JobStatus = z.infer<typeof JobStatus>
export const Job = z.object({
id: z.string().uuid(),
name: z.string().min(1).max(255),
status: JobStatus,
pdfPath: z.string(),
outputDir: z.string(),
config: PipelineConfig,
createdAt: z.string().datetime(),
updatedAt: z.string().datetime()
})
export type Job = z.infer<typeof Job>
// Export from index.ts
export { Job, JobStatus } from "./job.js"// API input validation
const result = Schema.safeParse(input)
if (!result.success) {
// Handle validation error
throw new HTTPException(400, {
message: result.error.issues.map(i => i.message).join(", ")
})
}
// Use result.data (typed correctly)
// Configuration with defaults
const config = PipelineConfig.parse(userConfig) // Applies defaults
// Type guards
if (Job.safeParse(data).success) {
// data is Job
}// CORRECT: Infer types from schemas
export const Job = z.object({ ... })
export type Job = z.infer<typeof Job>
// WRONG: Duplicate type definitions
export interface Job { ... } // Don't duplicate!
export const JobSchema = z.object({ ... })packages/types/src/config.ts # Source
packages/types/src/config.test.ts # Test (co-located)
import { describe, it, expect, beforeEach, vi } from "vitest"
import { functionToTest } from "./module.js"
describe("functionToTest", () => {
beforeEach(() => {
vi.clearAllMocks()
})
it("should handle valid input", () => {
const result = functionToTest(validInput)
expect(result).toEqual(expectedOutput)
})
it("should throw on invalid input", () => {
expect(() => functionToTest(invalidInput)).toThrow("Expected error message")
})
it("should apply defaults", () => {
const result = functionToTest({})
expect(result.optionalField).toBe("default")
})
})MUST test:
- Zod schema validation (valid/invalid inputs, defaults)
- Pure pipeline functions (input -> output)
- API endpoint request/response validation
- Error handling paths
- Edge cases (empty arrays, null values, etc.)
SHOULD test:
- React component rendering
- User interactions
- API client methods
Coverage targets:
packages/*: 80% minimumapps/api: 70% minimumapps/studio: 50% minimum (UI testing is harder)
// Mock LLM calls for tests
vi.mock("@adt/llm", () => ({
createLLMClient: () => ({
complete: vi.fn().mockResolvedValue("mocked response")
})
}))
// Mock file system
vi.mock("fs/promises", () => ({
readFile: vi.fn().mockResolvedValue("file contents"),
writeFile: vi.fn().mockResolvedValue(undefined)
}))// ALWAYS create new versions, NEVER overwrite
interface VersionedEntity {
id: string // Unique entity ID
version: number // Incrementing version
data: unknown // Entity-specific data
createdAt: string // ISO timestamp
createdBy?: string // User or "system"
inputVersions?: Record<string, number> // Dependencies
}
// Creating a new version
async function saveNewVersion(
db: Database,
entityId: string,
data: unknown,
createdBy?: string
): Promise<VersionedEntity> {
const current = await getLatestVersion(db, entityId)
const newVersion = (current?.version ?? 0) + 1
const entity: VersionedEntity = {
id: entityId,
version: newVersion,
data,
createdAt: new Date().toISOString(),
createdBy
}
await insertVersion(db, entity)
return entity
}// All LLM calls go through the cached client
import { createCachedLLMClient } from "@adt/llm"
const client = createCachedLLMClient({
apiKey,
cacheDir: path.join(bookDir, ".cache")
})
// Cache key is hash of: model + prompt + all parameters
const result = await client.complete({
model: "gpt-4o",
messages: [...],
temperature: 0 // Must be deterministic for caching
})Pipeline progress uses a ProgressEvent discriminated union streamed via SSE. Events are emitted per-step (not per-stage):
// ProgressEvent types (defined in @adt/types):
// - step-start: { step: StepName }
// - step-progress: { step: StepName, page, totalPages }
// - step-complete: { step: StepName }
// - step-skip: { step: StepName }
// - step-error: { step: StepName, error }
// The DAG runner emits these automatically as steps execute.
// The UI maps step events to their parent stage via STEP_TO_STAGE.Always-on SSE: The SSE connection (GET /api/books/:label/stages/status) opens when the book layout mounts and stays open until unmount. There is no toggle or reconnection logic — EventSource handles reconnection automatically. On every open event (initial connection or reconnection), a full step-status refetch runs to catch any events missed during the gap.
SSE patches the TanStack Query cache: SSE events directly update the step-status query data via setQueryData, keeping the cache in sync without local state machines:
step-start→ mark step and stage as"running"step-progress→ update a local progress ref (page X/Y, cosmetic) and ensure step is marked"running"(handles missedstep-starton reconnect)step-complete/step-skip→ mark step as"done", recompute parent stage (done if all steps done)step-error→ mark stage as"error", set error messagequeue-next→invalidateQueries(new run started, full refetch)complete→invalidateQueries(run finished, reconcile with DB)
Cancel in-flight fetches on SSE events: SSE progress events cancel any pending step-status query fetch via cancelQueries before updating the cache. This prevents a race where a stale fetch response (initiated by the open handler or window focus) arrives after an SSE event has already updated the cache, overwriting the more-current SSE state with an older server snapshot. This is a general TanStack Query pattern: when you have a push-based update channel (SSE/WebSocket) alongside pull-based queries, cancel in-flight pulls before applying pushes.
Stage/step status comes from a single source of truth: the GET /books/:label/step-status endpoint, cached via TanStack Query and patched live by SSE events. The backend computes stage and step states by merging three sources (highest priority first):
StageService.getRunningSteps()in-memory set — which individual steps are currently executing (added onstep-start, removed onstep-complete/step-skip/step-error)step_completionsDB table — persistent record of which steps have completed (survives page refresh)StageService.getStageStates()in-memory state — which stages are currently running, queued, or errored (based on the active job's from→to range)
For stages, precedence is:
queued/errorrun states win (explicit run intent/failure should remain visible)- then DB completion (
"done") for fully-complete stages - then run-derived
"running"/"idle"
This prevents completed stages from showing as "running" just because the active run range includes them, while still showing reruns ("queued") and failures ("error") clearly.
The merged response:
{
"stages": { "extract": "done", "storyboard": "running", "quizzes": "queued", ... },
"steps": { "extract": "done", "metadata": "done", "page-sectioning": "running", ... },
"error": null
}The frontend reads this via the useBookRun() hook. Stage views need only:
const { stageState, queueRun } = useBookRun()
const state = stageState("storyboard") // "idle" | "queued" | "running" | "done" | "error"
const showRunCard = state !== "done"Key rules:
- Recording:
step-runner.tswraps the progress emitter to callstorage.markStepComplete(step)on everystep-complete/step-skipevent. This is the only place completions are recorded. - Clearing:
makeBeforeRuninstages.tsclearsstep_completionsfor the target stage and all downstream stages (viagetStageClearOrder). - Schema migrations: The
step_completionstable was added in schema v7. Migrations backfill from existingnode_dataso previously-processed books don't appear incomplete. - Sub-step progress: Page X/Y progress during running steps is stored in a
useRef<Map>with a tick counter for reactivity, avoiding full re-renders on every progress event.
// Detect desktop vs Web environment
export function isDesktop(): boolean {
// Tauri
if (typeof window !== "undefined" && "__TAURI_INTERNALS__" in window) return true
// Electron
if (typeof window !== "undefined" && "electronAPI" in window) return true
return false
}
// Use for platform-specific behavior
const apiBase = isDesktop() ? "http://localhost:3000/api" : "/api"// WRONG: Duplicating logic
// In file1.ts
const validateJob = (job) => { ... }
// In file2.ts
const checkJob = (job) => { ... } // Same logic, different name!
// CORRECT: Single source of truth
// In packages/types/src/job.ts
export const Job = z.object({ ... })
// Use Job.parse() or Job.safeParse() everywherePipeline topology is especially prone to duplication. Stage ordering, step groupings, step-to-stage mappings, and dependency graphs must all be derived from the PIPELINE constant in @adt/types. Never hardcode these in the API, UI, or CLI.
// WRONG: Direct package import in frontend
import { runPipeline } from "@adt/pipeline" // NO!
// CORRECT: Always go through API
import { api } from "../api/client"
await api.createJob({ ... })// WRONG: Global mutable state
let currentJob: Job | null = null
export function setCurrentJob(job: Job) { currentJob = job }
// CORRECT: Component-local state or pass as parameters
const [currentJob, setCurrentJob] = useState<Job | null>(null)// WRONG: Hardcoded configuration
const MODEL = "gpt-4o"
const MAX_TOKENS = 4096
// CORRECT: Use configuration
import { PipelineConfig } from "@adt/types"
const config = PipelineConfig.parse(userConfig)
const model = config.defaultModel// WRONG: Silent catch
try {
await riskyOperation()
} catch {
// Silently ignored!
}
// CORRECT: Handle or rethrow
try {
await riskyOperation()
} catch (err) {
console.error("Operation failed:", err)
throw err // Or handle appropriately
}// WRONG: Over-engineering
class JobManagerFactory {
createJobManager(config: Config): JobManager { ... }
}
class JobManager {
constructor(private repository: JobRepository) {}
async create(data: JobData): Promise<Job> { ... }
}
// CORRECT: Simple functions
export async function createJob(data: JobData): Promise<Job> {
const job = { id: crypto.randomUUID(), ...data }
await saveJob(job)
return job
}Before adding ANY new dependency:
- Check if functionality exists in Node.js built-ins
- Check if existing dependencies provide the functionality
- Justify the addition with clear benefits
- Prefer smaller, focused packages over large frameworks
- TypeScript strict mode passes (
pnpm typecheck) - No
anytypes (useunknownif truly unknown) - All new types have Zod schemas in
packages/types - No console.log in production code (use proper logging)
- No commented-out code
- No TODO comments without linked issues
- All user input validated with Zod
- No API keys logged or exposed
- No hardcoded secrets or credentials
- Path traversal prevention for file operations
- Parameterized queries for all SQL
- No
dangerouslySetInnerHTMLwithout sanitization
- Code placed in correct package/app
- No direct package imports in frontend
- Reused existing components/utilities
- Pure functions for pipeline logic
- Entity versioning (no overwrites)
- LLM calls go through cached client
- Tests written for new functionality
- Tests pass (
pnpm test) - Coverage maintained or improved
- Used Tailwind utilities only (no custom CSS)
- Error states handled and displayed
- Loading states for async operations
- API calls through
api/client.ts+ TanStack Query - No new state management libraries
- Pure JS/TS dependencies only — no native C/C++ bindings
- Complex logic has explanatory comments
- Public APIs have JSDoc comments
- README updated if new features added
# Install dependencies
pnpm install
# Run development servers
pnpm dev
# Type checking
pnpm typecheck
# Run tests
pnpm test
# Run tests with coverage
pnpm test:coverage
# Build all packages
pnpm build
# Lint
pnpm lint| Purpose | Location |
|---|---|
| Pipeline definition (stages/steps) | packages/types/src/pipeline.ts |
| API routes | apps/api/src/routes/ |
| API client | apps/studio/src/api/client.ts |
| Type schemas | packages/types/src/ |
| Pipeline step implementations | packages/pipeline/src/ |
| DAG runner | packages/pipeline/src/dag.ts |
| API stage runners | apps/api/src/services/step-runner.ts |
| LLM client | packages/llm/src/client.ts |
| Stage view components | apps/studio/src/components/pipeline/stages/ |
| Stage run service (queue, SSE) | apps/api/src/services/stage-service.ts |
| Book storage (DB schema, migrations) | packages/storage/src/db.ts |
| Book storage interface | packages/storage/src/storage.ts |
| Unified book run hook + context | apps/studio/src/hooks/use-book-run.ts |
| Book layout (BookRunProvider) | apps/studio/src/routes/books.$label.tsx |
| Stage config (colors, icons, labels) | apps/studio/src/components/pipeline/stage-config.ts |
| Stage sidebar | apps/studio/src/components/pipeline/StageSidebar.tsx |
| Global config | config/ |
| Templates | templates/ |
// Types
import { Job, PipelineConfig, BundleConfig } from "@adt/types"
// API client (frontend)
import { api } from "../api/client"
// LLM (backend)
import { createLLMClient, createCostTracker } from "@adt/llm"
// Validation
import { z } from "zod"
// Routing (frontend)
import { useNavigate, useParams, Link } from "@tanstack/react-router"
// Data fetching (frontend)
import { useQuery, useMutation, useQueryClient } from "@tanstack/react-query"
// Forms (frontend)
import { useForm } from "@tanstack/react-form"
// Tables (frontend)
import { useReactTable, getCoreRowModel } from "@tanstack/react-table"
// HTTP errors (backend)
import { HTTPException } from "hono/http-exception"| Version | Date | Changes |
|---|---|---|
| 0.3.0 | 2026-02-21 | Per-step run tracking in StageService, cancel-on-SSE pattern to prevent stale fetch overwrites |
| 0.2.0 | 2026-02-20 | Step completion tracking, DB-as-source-of-truth for stage status, query invalidation guidance |
| 0.1.0 | 2025-02-04 | Initial comprehensive guidelines |