Nexicd

An experimental .NET project that explores whether ICD-11 clinical coding can be treated as a small, local-first RAG system instead of a heavyweight ML platform.

Overview

The interesting part of Nexicd is not just "LLM in healthcare". The experiment is whether a relatively small, understandable system can combine structured medical taxonomy data, vector retrieval, and model-based reasoning — without a large orchestration stack or custom training pipeline. ICD-11 is a good place to test that. It's exactly the kind of structured, high-stakes domain where RAG is worth exploring seriously — and where the gap between a naive prompt and a grounded retrieval pipeline is easy to measure. Nexicd ingests WHO ICD-11 MMS data, embeds it into a local SQLite vector store, and runs a four-stage pipeline to turn clinical text into candidate codes.

Demo

The demo flow is intentionally simple: build a local ICD-11 vector store, enter a short clinical note, and inspect how the pipeline narrows the problem down to a validated ICD-11 code.

Why This Project Exists

This project started as an experiment in practical retrieval-augmented coding.

The problem was straightforward: clinical conversations are noisy, ICD-11 is large and hierarchical, and naive prompting alone is too brittle for consistent code selection. I wanted to explore whether a lightweight RAG architecture could narrow the search space first, then let the model reason over a smaller, more structured set of candidates.

The project was also a way to learn more about three things:

using WHO ICD-11 MMS data as a searchable local knowledge base
building a multi-stage LLM pipeline in plain .NET without heavy framework lock-in
treating a local SQLite vector store as a developer-friendly retrieval layer for experiments

Key Ideas

Use WHO ICD-11 MMS as the source of truth instead of asking a model to memorize the taxonomy.
Keep the retrieval layer local by writing embeddings into a SQLite database.
Separate ingestion from query-time coding so the developer loop stays fast after the index is built.
Treat the pipeline as a sequence of explicit stages: normalize, extract, retrieve, select, validate.
Prefer understandable engineering tradeoffs over "AI magic".

Features

WHO ICD-11 MMS ingestion into a local SQLite vector store
Interactive CLI for trying the coding pipeline on clinical text
Structured extraction of findings from noisy conversations
Candidate retrieval using vector search over ICD-11 entities
LLM-based code selection from retrieved candidates
WHO-backed code validation to catch hallucinated primary codes
Unit and integration test coverage, plus an opt-in live smoke test

Architecture

graph TD
    A["Clinical note or conversation"] --> B["Input normalization"]
    B --> C["Stage 1: Clinical extraction"]
    C --> D["Stage 2: Vector retrieval"]
    D --> E["Stage 3: Code selection"]
    E --> F["Stage 4: WHO validation"]
    F --> G["Coding result"]

    H["WHO ICD-11 MMS API"] --> I["Ingestion pipeline"]
    I --> J["SQLite vector store"]
    J --> D

    K["OpenAI embeddings"] --> I
    L["OpenAI chat models"] --> C
    L --> E

The system is split into three small projects:

Nexicd.Core: models, WHO client, parsing, retrieval, and pipeline logic
Nexicd.Ingestion: builds the vector database from WHO ICD-11 data
Nexicd.Console: runs the interactive coding workflow against the local database

That split keeps ingestion concerns separate from the runtime query path. Once the database is built, the CLI can focus on retrieval and reasoning instead of rebuilding state on startup.

Getting Started

Installation

git clone https://github.com/username/Nexicd.git
cd Nexicd
dotnet restore

Configure environment

Create a local env file or export variables directly:

cp .env.example .env

Required for ingestion and the interactive console:

export OPENAI_API_KEY="your-key"

Optional:

export ICD_API_BASE="http://localhost"
export OUTPUT_LANGUAGE="English"

If you want to validate against the WHO cloud API instead of the local Docker image:

export ICD_API_BASE="https://id.who.int"
export WHO_CLIENT_ID="your-client-id"
export WHO_CLIENT_SECRET="your-client-secret"

Start the local ICD API

docker compose up -d

Build the vector store

dotnet run --project src/Nexicd.Ingestion

Run the project

dotnet run --project src/Nexicd.Console

Example Usage

CLI session

Input:

Patient has a runny nose, sore throat, and sneezing for two days. No fever.

Possible output:

Primary: CA25 - Acute nasopharyngitis
Confidence: HIGH
Reasoning: Symptoms and duration are consistent with a common cold and do not suggest a more specific alternative.

Using a custom database path

dotnet run --project src/Nexicd.Ingestion -- --db ./data/dev-nexicd.db
dotnet run --project src/Nexicd.Console -- --db ./data/dev-nexicd.db

Running tests

Default test suite:

dotnet test Nexicd.sln

Opt-in live smoke test:

RUN_LIVE_SMOKE_TESTS=1 dotnet test Nexicd.sln --filter FullyQualifiedName~LivePipeline_CommonCold_ReturnsExpectedCode

Project Status

Experimental / work in progress.

This repository is intentionally positioned as an engineering exploration, not a production medical coding product. The architecture is stable enough to demonstrate the idea, but the project still has prototype-era constraints around determinism, operational hardening, and compliance boundaries.

Roadmap

Possible next directions:

expose the pipeline through a small HTTP API instead of only a CLI
validate secondary codes, not just the primary code
make ICD release selection configurable instead of hardcoded
evaluate retrieval quality on a larger curated benchmark set
compare the local SQLite approach with a remote vector database
add richer telemetry and prompt-trace diagnostics for debugging

Contributing

Ideas, criticism, and experiments are welcome.

If you see a cleaner retrieval approach, a better way to structure the pipeline, or a useful test case for ICD-11 coding behavior, open an issue or send a pull request.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
data		data
docs/assets		docs/assets
src		src
tests/Nexicd.Tests		tests/Nexicd.Tests
.gitignore		.gitignore
Directory.Build.props		Directory.Build.props
Directory.Packages.props		Directory.Packages.props
LICENSE		LICENSE
Nexicd.sln		Nexicd.sln
README.md		README.md
docker-compose.yml		docker-compose.yml
global.json		global.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Nexicd

Overview

Demo

Why This Project Exists

Key Ideas

Features

Architecture

Getting Started

Installation

Configure environment

Start the local ICD API

Build the vector store

Run the project

Example Usage

CLI session

Using a custom database path

Running tests

Project Status

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Nexicd

Overview

Demo

Why This Project Exists

Key Ideas

Features

Architecture

Getting Started

Installation

Configure environment

Start the local ICD API

Build the vector store

Run the project

Example Usage

CLI session

Using a custom database path

Running tests

Project Status

Roadmap

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages