Research library

Show your work.

The AI inside MediSero+ is held to the same standard as a clinical study. Methodology, benchmarks, validation studies — published openly, with reproducible methods and honest about what's measured vs what's still in-prep.

Latest methodology See Clio in action

Open methodology · Reproducible benchmarks · Citing sources by default

Design principles

Six rules. Every model decision goes through them.

When we add a feature, we run it past these. If it can't pass all six, we don't ship it.

Grounded, not generated

Every clinical answer cites the past visits it pulled from. No retrieval, no answer.

Drafts only

Clio never auto-saves to a patient record. The doctor signs. Always.

Workspace-scoped

Each clinic's data is isolated. No cross-clinic learning without explicit consent + anonymisation.

Audit-first

Every Clio output is timestamped, attributed, and traceable to source. Forever.

Edit-in-place feedback

Doctor edits are the training signal. Local, per-clinic. Your edits make Clio yours.

Toggle anything

Master switch off. Per-screen off. Per-feature off. Privacy is a setting, not a contract clause.

What we've measured

Honest benchmarks. Reproducible.

Every number below is reproducible from the corresponding methodology post. No marketing-only stats. If a number isn't here, we haven't measured it yet.

9.7%

WER on code-switched dictation

Down from 31.6% baseline. Measured across 4,318 dictations from 22 pilot clinics.

Source · methodology post

89%

Top-3 differential accuracy

On 1,200 anonymised primary-care cases. Vanilla LLM scored 61% on the same set.

Source · methodology post

0.4%

Hallucination rate

With RAG + cross-encoder rerank. Vanilla LLM: 14% on the same benchmark.

Source · methodology post

1.3%

Drug-name error rate

Post-pipeline. Down from 18% baseline via clinical lexicon biasing.

Source · methodology post

Methodology — public posts

Two long-form posts. Full architecture, full numbers.

These two cover what's currently measured and shipped. Formal whitepaper versions will follow as the methodology stabilises through 2026.

Public 12 min

How Clio cites past visits — RAG in clinical AI

Full architecture: pgvector inside the workspace, two-stage retrieval with cross-encoder rerank, prompt template, post-processing verification, citation rendering.

Read post

Public 9 min

Speech-to-SOAP in Hinglish — measuring what works

WER per language across 4,318 dictations. Three failure modes (language ID flips, numerics, drug names) and the four-step pipeline that fixed them.

Read post

Papers & benchmarks

Six write-ups planned. Two public, four in prep.

Most methodology starts as a public blog post then matures into a formal whitepaper or peer-reviewed submission. We publish in the order we measure.

Methodology · Public Apr 2026

How Clio cites past visits — RAG architecture for clinical AI

pgvector inside the workspace. Chunked on every Visit, Rx, and lab upload. Retrieval scoped to the patient + similar cases, never cross-clinic.

Read post

Benchmark · Public Apr 2026

Multi-language voice-to-SOAP — WER per scenario

Word-error rate measured on 4,318 dictations across English, Hindi, and code-switched Hinglish. Open methodology, reproducible from the corpus structure.

Read post

Validation · In prep Q3 2026

Hallucination & refusal rates — Clio vs frontier baselines on clinical Q&A

Expanding the rag-in-clinical-ai post into a formal write-up with extended baselines (GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Pro), confusion matrices, and ablations.

Drafting

Methodology · In prep Q3 2026

Speaker diarisation in clinic recordings — DR vs PT vs family

Labelling up to four speakers per consult, surfacing ambiguity rather than guessing. Pilot data being collected through Q2 2026; write-up follows.

Drafting

Whitepaper · In prep Q3 2026

Right-to-erasure in a RAG-indexed clinical record

DPDP Act 17 / GDPR Article 17 right-to-erasure when a patient's embeddings are scattered across a workspace index. Cryptographic deletion via key-rotation.

Drafting

Submission · In review Late 2026

Clinic-native model fine-tuning — three months in

Per-clinic fine-tunes vs zero-shot frontier models on note quality. Pilot underway with two specialist clinics. Target submission to a peer-reviewed journal.

Awaiting review feedback

Model stack

What's under the hood.

Transparency about what powers each Clio capability — provider, model, retraining cadence. We retrain or swap any of these freely; this table reflects the current production stack and updates whenever we ship a meaningful change.

Capability	Provider · Model	Updated
Voice transcription (first pass)	Deepgram Nova-2 with confidence routing	Quarterly
Voice transcription (low-confidence segments)	Whisper-large fine-tuned + clinical lexicon biasing	Quarterly
Note structuring (SOAP / Brief / Detailed)	GPT-4o-mini · GPT-4o for hard cases	Monthly
Embedding for RAG retrieval	all-mpnet-base-v2 fine-tuned (EN + HI pairs)	Quarterly
Vector store	pgvector (per-workspace namespace)	Continuous
Reranker	Cross-encoder ms-marco-MiniLM-L6-v2	Quarterly
Drug research (monograph + interaction)	Internal RAG over RxNorm + DrugBank · GPT-4o-mini	Weekly
Clinical Q&A (Clio chat)	RAG over workspace · GPT-4o-mini · GPT-4o for >1k tokens	Monthly
Suggested tasks	RAG + reasoning model · per-visit prompt	Monthly

Sub-processor list (every third party that touches your data, with regions and contract dates) is on /compliance (coming soon).

Data policy — short version

What we do — and don't do — with your clinic's data.

Do you train on customer data?

No. Default. Per-workspace embeddings are used for retrieval only — never for cross-clinic model training. Cross-clinic learning requires explicit opt-in and is anonymised before any training set inclusion.

Is data sent to OpenAI used to train their models?

No. Our OpenAI Enterprise contract has a no-training-on-customer-data clause and zero log retention. Their API tier we use does not retain inputs or outputs beyond the request lifetime.

Where do you store embeddings?

In pgvector inside Postgres in your region (AWS Mumbai for India by default). Embeddings are deleted alongside their source records on right-to-erasure within 30 days.

Can I see my own retrieval logs?

Yes. Every retrieval that produces a Clio answer is logged to your workspace audit log: query, retrieved chunk IDs, ranking scores. Available in Settings · Compliance · Audit log.

Full policies at /privacy and /compliance (coming soon).

Reach out

Researchers, regulators, journalists.

If you want raw access to a benchmark, source data for a paper, or a deeper conversation about methodology — write to research@medisero.com.

Email research See compliance