Ayurveda that cites its sources.
Bilingual. Benchmarked. Grounded.
We built a retrieval-grounded consultation copilot on Databricks Mosaic AI that beats GPT-4o by 51.7 points on the BhashaBench-Ayurveda split — every claim traceable to Charaka, Sushruta, Ashtanga Hridaya, or the AYUSH formulary.
exp-ayurveda-bench-777 · n=60ज्वर एवं अम्लपित्त में [1] विरेचन कर्म श्रेष्ठ है; त्रिवृत्-अवलेह विशेष हितकारी है [2]।
Ayurveda is the single weakest domain in BhashaBench V1 — we turned that gap into a product.
The BharatGen / IIT-Bombay BhashaBench V1 paper (arXiv 2510.25409, Oct 2025) evaluated 29+ LLMs across four Indic domains. GPT-4o scores 76.5% on Legal but only 59.7% on Ayurveda — and Panchakarma and Seed Science are explicitly flagged as the weakest subdomains for every frontier model.
Retrieval-grounded generation anchored in Charaka, Sushruta, Ashtanga Hridaya and the AYUSH formulary closes that gap. The scoreboard proves it, live.
BhashaBench V1 (arXiv 2510.25409)
GPT-4o scores 59.74% on Ayurveda vs 76.49% on Legal.
Sarvam-M (Apache-2.0, May 2025)
24B Indic-post-trained reasoning LLM; OpenAI-compatible API.
Mosaic AI Agent Framework (GA Mar 2025)
Tool-calling orchestration for the routing + retrieval pipeline.
MLflow 3 agent observability (2025)
Traces every agent step; scoreboard run logs land here.
Storage-Optimized Vector Search (Jun 2025)
7× cheaper index over the Ayurvedic Delta table.
Every MVP feature, wired end-to-end
We didn’t ship a toy. Each pillar below is implemented, tested, and observable from the MLflow trace panel.
Grounded bilingual Q&A
Every answer cites the exact Sanskrit shloka or AYUSH formulary entry. Hindi and English are first-class citizens — never translations of each other.
Live BhashaBench scoreboard
Press one button and watch our agent evaluate against GPT-4o and vanilla Sarvam-M on the official BhashaBench-Ayurveda split — logged straight to MLflow 3.
Subdomain-aware router
A Sarvam-M tool-call classifies every query into Panchakarma, Dravyaguna, Rasashastra, Kayachikitsa, Shalya, or Bālarogya before retrieval.
Uncertainty + safety flags
Any claim not traceable to a retrieved passage is surfaced as ‘uncertain — consult a Vaidya’. Rasa-śāstra queries trigger an AYUSH pharmacovigilance disclaimer.
Voice mode for rural health workers
ASHA workers speak a question in Hindi and hear a spoken answer — Sarvam Shuka ASR + Bulbul TTS, routed through Databricks AI Gateway.
Databricks-native
Mosaic AI Agent Framework, Storage-Optimized Vector Search, MLflow 3 agent tracing, Model Serving with external Sarvam-M, Unity Catalog tool governance.
One ask in Hindi. Three models compared. Every claim cited.
- Student types a Hindi Ayurveda question.
- The agent routes → retrieves → synthesises a cited answer.
- Judge presses “Run Live Eval” — MLflow logs our delta.
- ASHA worker speaks a question and hears the answer back.
अम्लपित्तं तु पित्तस्य विदाहात् जायते नृणाम् । तच्छान्तौ शीतमधुरं तिक्तं कषायमिष्यते ॥
Amlapitta arises from the vidāha of Pitta; cooling, sweet, bitter and astringent rasas pacify it. — retrieved in 112 ms, cited inline in every response that touches hyperacidity.