Industry-leading Voice AI models. Straight from the source.

Modulate's Velma is the #1 model for transcription accuracy, deepfake detection, and conversation understanding. Now available as direct APIs — so you can build on the same intelligence that powers our enterprise platform.

See Modulate in action

No credit card required.

Velma Transcribe

The most accurate speech-to-text API for real-world conversations. Handles interruptions, accents, overlapping speech, and noise that breaks typical systems.

  • #1 accuracy on real-world benchmarks including Earnings 22 and AMI

  • Up to 10× cheaper than Deepgram

  • Batch + real-time streaming

  • Diarization, emotion, accent detection included free

  • PII/PHI redaction (including redacted audio) for +$0.02/hr

Starting at $0.03 / hour400 hours in free credits
Velma Deepfake Detect

The #1 ranked deepfake detection model on 🤗 Hugging Face's Speech Deepfake Arena. Catches what others miss — including mid-call voice switches that gate-check systems are blind to.

  • 1.1% equal error rate, less than half the next-best model

  • 120× lower cost than closest competitor

  • Works with just 3 seconds of audio

  • Segment-level scores, updated every 2 seconds

$0.25 / hour 1,000 free credits
Velma Voice Intelligence (Coming Soon)

Full voice intelligence via API — intent, emotion, fraud signals, compliance risk, policy violations, and more. Built on the same Ensemble Listening Model that powers Modulate's enterprise platform.

No fine-tuning. No LLM overhead. Auditable, structured outputs you can act on.

Not just another LLM wrapper.

Most voice AI APIs transcribe audio and hand the text to a language model. Context, tone, and everything that makes a voice conversation meaningful gets discarded at step one.

Velma is built differently. Our Ensemble Listening Model (ELM) processes audio natively — understanding conversations the way a human listener would, with full awareness of how something is said, not just the words.

The result is an API that’s more accurate, more cost-efficient, and capable of outputs that text-first systems simply can’t produce.

Built for developers shipping production systems

Velma Transcribe is designed to integrate cleanly into modern infrastructure.

REST endpoints for batch transcription

Streaming endpoints for real-time transcription

Predictable structured output for downstream pipelines

Built for scalable high-throughput workloads

Velma API is designed to work well with analytics stacks, search systems, and LLM-based workflows.

Read the docs

Transcription Is Just the Beginning

Start free.Scale
when you’re ready.

Our API includes free credits to get you started — no card required, no sales call necessary. When you’re ready to go to production, usage-based pricing means you pay for what you use.

Need volume, SLAs, or custom endpoints? We can help with that too.