Grounded, not generated
Every clinical answer cites the past visits it pulled from. No retrieval, no answer.
The AI inside MediSero+ is held to the same standard as a clinical study. Methodology, benchmarks, validation studies — published openly, with reproducible methods and honest about what's measured vs what's still in-prep.
When we add a feature, we run it past these. If it can't pass all six, we don't ship it.
Every clinical answer cites the past visits it pulled from. No retrieval, no answer.
Clio never auto-saves to a patient record. The doctor signs. Always.
Each clinic's data is isolated. No cross-clinic learning without explicit consent + anonymisation.
Every Clio output is timestamped, attributed, and traceable to source. Forever.
Doctor edits are the training signal. Local, per-clinic. Your edits make Clio yours.
Master switch off. Per-screen off. Per-feature off. Privacy is a setting, not a contract clause.
Every number below is reproducible from the corresponding methodology post. No marketing-only stats. If a number isn't here, we haven't measured it yet.
These two cover what's currently measured and shipped. Formal whitepaper versions will follow as the methodology stabilises through 2026.
Full architecture: pgvector inside the workspace, two-stage retrieval with cross-encoder rerank, prompt template, post-processing verification, citation rendering.
WER per language across 4,318 dictations. Three failure modes (language ID flips, numerics, drug names) and the four-step pipeline that fixed them.
Most methodology starts as a public blog post then matures into a formal whitepaper or peer-reviewed submission. We publish in the order we measure.
pgvector inside the workspace. Chunked on every Visit, Rx, and lab upload. Retrieval scoped to the patient + similar cases, never cross-clinic.
Word-error rate measured on 4,318 dictations across English, Hindi, and code-switched Hinglish. Open methodology, reproducible from the corpus structure.
Expanding the rag-in-clinical-ai post into a formal write-up with extended baselines (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro), confusion matrices, and ablations.
Labelling up to four speakers per consult, surfacing ambiguity rather than guessing. Pilot data being collected through Q2 2026; write-up follows.
DPDP Act 17 / GDPR Article 17 right-to-erasure when a patient's embeddings are scattered across a workspace index. Cryptographic deletion via key-rotation.
Per-clinic fine-tunes vs zero-shot frontier models on note quality. Pilot underway with two specialist clinics. Target submission to a peer-reviewed journal.
Transparency about what powers each Clio capability — provider, model, retraining cadence. We retrain or swap any of these freely; this table reflects the current production stack and updates whenever we ship a meaningful change.
| Capability | Provider · Model | Updated |
|---|---|---|
| Voice transcription (first pass) | Deepgram Nova-2 with confidence routing | Quarterly |
| Voice transcription (low-confidence segments) | Whisper-large fine-tuned + clinical lexicon biasing | Quarterly |
| Note structuring (SOAP / Brief / Detailed) | GPT-4o-mini · GPT-4o for hard cases | Monthly |
| Embedding for RAG retrieval | all-mpnet-base-v2 fine-tuned (EN + HI pairs) | Quarterly |
| Vector store | pgvector (per-workspace namespace) | Continuous |
| Reranker | Cross-encoder ms-marco-MiniLM-L6-v2 | Quarterly |
| Drug research (monograph + interaction) | Internal RAG over RxNorm + DrugBank · GPT-4o-mini | Weekly |
| Clinical Q&A (Clio chat) | RAG over workspace · GPT-4o-mini · GPT-4o for >1k tokens | Monthly |
| Suggested tasks | RAG + reasoning model · per-visit prompt | Monthly |
Sub-processor list (every third party that touches your data, with regions and contract dates) is on /compliance (coming soon).
No. Default. Per-workspace embeddings are used for retrieval only — never for cross-clinic model training. Cross-clinic learning requires explicit opt-in and is anonymised before any training set inclusion.
No. Our OpenAI Enterprise contract has a no-training-on-customer-data clause and zero log retention. Their API tier we use does not retain inputs or outputs beyond the request lifetime.
In pgvector inside Postgres in your region (AWS Mumbai for India by default). Embeddings are deleted alongside their source records on right-to-erasure within 30 days.
Yes. Every retrieval that produces a Clio answer is logged to your workspace audit log: query, retrieved chunk IDs, ranking scores. Available in Settings · Compliance · Audit log.
Full policies at /privacy and /compliance (coming soon).
If you want raw access to a benchmark, source data for a paper, or a deeper conversation about methodology — write to research@medisero.com.