Ayurveda Copilot Bharat
आयुर्वेद सहायक · शास्त्र-आधारित
BhashaBench V1 · Ayurveda split

Live BhashaBench-Ayurveda scoreboard

Evaluates our grounded agent, GPT-4o zero-shot, and vanilla Sarvam-M on a held-out Ayurveda slice — tagged by subdomain and language. Every run is streamed live and logged to MLflow 3.

Our agent
0.0%
0/0 on held-out split
GPT-4o · zero-shot
0.0%
0/0 on held-out split
Sarvam-M · vanilla
0.0%
0/0 on held-out split

Accuracy delta vs GPT-4o: +0.0 pts

Ayurveda Copilot (ours)
0.0%
0/0
GPT-4o (zero-shot)
0.0%
0/0
Sarvam-M (vanilla)
0.0%
0/0

How the scoreboard works

Every evaluation run is orchestrated by a Databricks Job that calls Mosaic AI Model Serving for the three models, aggregates the per-question JSON into an MLflow Run, and pushes per-subdomain accuracy metrics. In this deployment the same harness runs inside the Next.js route for instant judge demos — the metric numbers are deterministic given a seed.

Mosaic AI Agent Framework
Vector Search
MLflow 3
Unity Catalog
AI Gateway
Foundation Model APIs