Databricks-native architecture

Every Databricks primitive that matters

The system is built on the Mosaic AI stack end-to-end: Agent Framework orchestrates tool calls, Storage-Optimized Vector Search hosts the classical-text index, MLflow 3 traces every agent run, Model Serving fronts Sarvam-M via an External Models endpoint, and Unity Catalog governs tools. The frontend is a thin Next.js app on Vercel; a Docker image bundles the judge-mode fallback.

Mosaic AI Agent Framework

The four-step agent (router → retriever → synthesiser → guardrails) is defined as a MLflow 3 chain. Each tool is registered in Unity Catalog so reviewers can inspect the DAG before running it.

Agent Framework GAPython SDKtool-calling

Storage-Optimized Vector Search

`ayurveda.texts.chunks` is a Delta table (Sanskrit + Hindi + English + keywords). We index the Hindi + English fields with BGE-M3, and query with subdomain filters from the router.

Storage-optimized7× cheaperBGE-M3 embeddings

Model Serving · External Sarvam-M

We expose Sarvam-M via the Databricks External Models custom-provider option (GA March 2025). The AI Gateway adds rate-limit fallback to Llama-3-70B Foundation Model API for the live demo.

Sarvam-M 24BExternal ModelsAI Gateway fallback

MLflow 3 · agent observability

Every consult and every scoreboard run becomes an MLflow Run under experiment ayurveda-copilot/bench-runs. Per-subdomain accuracy, retrieval score, and token usage are logged as metrics; the trace view shows every tool call in the UI.

Runs + TracesBhashaBench harnessArtifact logs

Safety guardrails

Mosaic AI Gateway guardrails plus a custom post-processor mark unsupported claims as uncertain, surface AYUSH pharmacovigilance warnings, and block cross-recommendation of modern PPIs/anticoagulants with Rasa formulations.

Unsupported-claim flagAYUSH advisoriesPHI redaction

Voice mode (stretch)

For ASHA workers: Sarvam Shuka ASR → our agent → Sarvam Bulbul TTS, all routed through the AI Gateway so the same key policy applies. In judge mode the browser's SpeechRecognition and SpeechSynthesis are used.

Shuka ASRBulbul TTShi-IN default

End-to-end agent trace

Matches what MLflow 3 records; mirrored in the consult sidebar.

1. subdomain_router

Sarvam-M tool-call classifies query into Panchakarma / Dravyaguna / Rasashastra / Kayachikitsa / Shalya / Balarogya.

2. vector_search

Mosaic AI VS retrieves k=4 chunks, filtered by subdomain for a precision boost on weakest bands.

3. grounded_synthesis

Sarvam-M synthesises an answer in the user's language, using only retrieved passages; inline [1]..[4] citations are enforced in the system prompt.

4. safety_guardrails

Rule-based post-processor + AI Gateway guardrail flags unsupported claims, surfaces AYUSH advisories, and downgrades confidence.

Reproducing on Databricks

One notebook, zero manual steps. Works on the Databricks free trial.

Run notebooks/00_ingest_corpus.py — ingests the public-domain texts into Unity Catalog ayurveda.raw.
Run notebooks/10_build_vector_index.py — builds the Storage-Optimized Vector Search index from the chunked Delta table.
Run notebooks/20_register_agent.py — registers the Mosaic AI Agent Framework chain + deploys the Model Serving endpoint via External Models (Sarvam-M).
Run notebooks/30_bhashabench_eval.py — runs BhashaBench-Ayurveda against our agent + GPT-4o + vanilla Sarvam-M and logs to MLflow 3.

Run a mock eval now Inspect the corpus