Velma Transcribe by Modulate

Transcription built for real conversations.

Velma Transcribe is Modulate’s speech-to-text API for batch and real-time streaming transcription. It’s engineered for the audio that breaks typical systems: multi-speaker conversations, overlapping speech, interruptions, accents, and noisy environments.

Velma Transcribe delivers best-in-class conversation transcription accuracy with transparent pricing designed for scale.

Proven #1 Most Accurate for Real Conversations

Save 90% over the competition

Batch + Streaming models

Compare Velma Transcribe
to the competition

Transcription Benchmark — Complex Real World Conversations
Tests Word Error Rate on real-world, complex conversations (AMI Meeting Corpus dataset)
Lowest word error lowest cost
Cost per 1000 minutes of audio
Word Error Rate
velma-2-transcribe
canary-qwen
voxtral-mini
gemini-2.5-flash
assembly-ai
deepgram-nova-2
speechmatics
eleven-labs-scribe-v2
gladia-solaria-1
whisper-large-v3
azure-speech
aws-transcribe
parakeet-tdt-v3
parakeet-rnnt
gpt-4o-transcribe
chirp-2
10
20
30
40 %
$1
2
3
4
$5
10
15
20
$25
0
Modulate
NVIDIA
Mistral
Google
AssemblyAI
Deepgram
Speechmatics
ElevenLabs
Gladia
OpenAI
Microsoft
Amazon

Industry leading pricing for batch and streaming transcription.

Transcription is the key that unlocks voice UIs, reliable compliance or meeting notes, and other essentials. It shouldn’t be breaking the bank.

Velma Transcribe offers usage-based pricing designed for high-volume transcription workloads. It is built to be cost-efficient enough to scale across your product.

See Pricing

Velma is up to 10× lower cost than Deepgram for transcription workloads.

Starting at 2.5 cents per hour

Speech-to-text that stays clean when conversations get messy.

Many speech-to-text APIs depend on clean audio, but degrade when real conversation begins—people interrupt each other, speakers overlap, and audio quality shifts.

Velma Transcribe is designed for support calls, business meetings, social chats, and other dynamic environments where accuracy is key.

Velma Transcribe is best for:

Call center transcription and QA workflows

Real-time voice agents and assistants

Social and gaming chats

Meeting transcription and meeting intelligence

Large-scale transcription pipelines where cost matters

Why teams choose Velma.

Velma Transcribe is built to solve the practical problems that matter in production transcription systems: accuracy, stability, latency, and unit economics.

Quality and cost don’t have to compete

Half a billion hours of audio to train on and a world-class team focused purely on audio AI means we can offer the world’s most accurate solution while also being the most cost-effective

Conversation-first transcription accuracy

Velma is optimized for conversational speech, offering robust accuracy in the face of overlaps, interruptions, and informal dialogue.

Sub-second real-time streaming transcription

Velma supports low-latency streaming transcription suitable for live UI, agent pipelines, and real-time systems.

Better transcripts improve everything downstream

Higher transcription accuracy improves meeting summaries, analytics, compliance, search, and LLM workflows.

Built for developers shipping production systems

Velma Transcribe is designed to integrate cleanly into modern infrastructure.

REST endpoints for batch transcription

Streaming endpoints for real-time transcription

Predictable structured output for downstream pipelines

Built for scalable high-throughput workloads

Velma API is designed to work well with analytics stacks, search systems, and LLM-based workflows.

Read the docs

Velma Transcribe sets a new standard for speech-to-text

Most speech-to-text providers optimize for general transcription. Velma Transcribe is specifically optimized for conversational speech and favorable economics at scale. And we’re not afraid to prove it. We’ve published a tool to directly compare the unaltered results from four top speech-to-text APIs - see for yourself how we compare on cost, latency, and accuracy over alternatives like Deepgram and AssemblyAI.

Benchmark against your provider

Get started with Velma Transcribe now.