Industry-leading Voice AI models. Straight from the source.

Modulate's Velma is the #1 model for transcription accuracy, deepfake detection, and conversation understanding. Now available as direct APIs — so you can build on the same intelligence that powers our enterprise platform.

See Modulate in action

No credit card required.

Velma Transcribe

The most accurate speech-to-text API for real-world conversations. Handles interruptions, accents, overlapping speech, and noise that breaks typical systems.

#1 accuracy on real-world benchmarks including Earnings 22 and AMI
Up to 10× cheaper than Deepgram
Batch + real-time streaming
Diarization, emotion, accent detection included free
PII/PHI redaction (including redacted audio) for +$0.02/hr

Starting at $0.03 / hour400 hours in free credits

View Docs

Velma Deepfake Detect

The #1 ranked deepfake detection model on 🤗 Hugging Face's Speech Deepfake Arena. Catches what others miss — including mid-call voice switches that gate-check systems are blind to.

1.1% equal error rate, less than half the next-best model
120× lower cost than closest competitor
Works with just 3 seconds of audio
Segment-level scores, updated every 2 seconds

$0.25 / hour 1,000 free credits

View Docs

Velma Voice Intelligence (Coming Soon)

Full voice intelligence via API — intent, emotion, fraud signals, compliance risk, policy violations, and more. Built on the same Ensemble Listening Model that powers Modulate's enterprise platform.

No fine-tuning. No LLM overhead. Auditable, structured outputs you can act on.

Not just another LLM wrapper.

Most voice AI APIs transcribe audio and hand the text to a language model. Context, tone, and everything that makes a voice conversation meaningful gets discarded at step one.

Velma is built differently. Our Ensemble Listening Model (ELM) processes audio natively — understanding conversations the way a human listener would, with full awareness of how something is said, not just the words.

The result is an API that’s more accurate, more cost-efficient, and capable of outputs that text-first systems simply can’t produce.

Learn more about the ELM architecture

Built for developers shipping production systems

Velma Transcribe is designed to integrate cleanly into modern infrastructure.

REST endpoints for batch transcription

Streaming endpoints for real-time transcription

Predictable structured output for downstream pipelines

Built for scalable high-throughput workloads

Velma API is designed to work well with analytics stacks, search systems, and LLM-based workflows.

Read the docs

Transcription Is Just the Beginning

Start free.Scale
when you’re ready.

Our API includes free credits to get you started — no card required, no sales call necessary. When you’re ready to go to production, usage-based pricing means you pay for what you use.

Need volume, SLAs, or custom endpoints? We can help with that too.

Get Free API Access

Talk to Sales

Cookie consent notice

Preferences Dashboard

Industry-leading Voice AI models. Straight from the source.

Not just another LLM wrapper.

Built for developers shipping production systems

Transcription Is Just the Beginning

Start free.Scalewhen you’re ready.

Start free.Scale
when you’re ready.