Velma Transcribe by Modulate
The transcription API built for real conversations.
Velma Transcribe is Modulate’s speech-to-text API for batch and real-time streaming transcription. It’s engineered for the audio that breaks typical systems: multi-speaker conversations, overlapping speech, interruptions, accents, and noisy environments.
Velma Transcribe delivers best-in-class conversation transcription accuracy with transparent pricing designed for scale.
Proven #1 Most Accurate for Real Conversations • Save 90% over the competition • Batch + Streaming models
Speech-to-text that stays clean when conversations get messy.
Many speech-to-text APIs depend on clean audio, but degrade when real conversation begins—people interrupt each other, speakers overlap, and audio quality shifts.
Velma Transcribe is designed for support calls, business meetings, social chats, and other dynamic environments where accuracy is key.
Velma Transcribe is best for:
Call center transcription and QA workflows
Real-time voice agents and assistants
Social and gaming chats
Meeting transcription and meeting intelligence
Large-scale transcription pipelines where cost matters

Why teams choose Velma.
Velma Transcribe is built to solve the practical problems that matter in production transcription systems: accuracy, stability, latency, and unit economics.
Quality and cost don’t have to compete
Half a billion hours of audio to train on and a world-class team focused purely on audio AI means we can offer the world’s most accurate solution while also being the most cost-effective
Conversation-first transcription accuracy
Velma is optimized for conversational speech, offering robust accuracy in the face of overlaps, interruptions, and informal dialogue.
Long-form context stability
Velma maintains strong accuracy across long recordings, meetings, and extended calls.
Sub-second real-time streaming transcription
Velma supports low-latency streaming transcription suitable for live UI, agent pipelines, and real-time systems.
Better transcripts improve everything downstream
Higher transcription accuracy improves meeting summaries, analytics, compliance, search, and LLM workflows.

Industry leading pricing for batch and streaming transcription.
Transcription is the key that unlocks voice UIs, reliable compliance or meeting notes, and other essentials. It shouldn’t be breaking the bank.
Velma Transcribe offers usage-based pricing designed for high-volume transcription workloads. It is built to be cost-efficient enough to scale across your product.
Velma is up to 10× lower cost than Deepgram for transcription workloads.
Multilingual support with
broad language coverage.
Built for developers shipping production systems
Velma Transcribe is designed to integrate cleanly into modern infrastructure.
REST endpoints for batch transcription
Streaming endpoints for real-time transcription
Predictable structured output for downstream pipelines
Built for scalable high-throughput workloads
Velma API is designed to work well with analytics stacks, search systems, and LLM-based workflows.

Velma Transcribe sets a new standard for speech-to-text
Most speech-to-text providers optimize for general transcription. Velma Transcribe is specifically optimized for conversational speech and favorable economics at scale. And we’re not afraid to prove it. We’ve published a tool to directly compare the unaltered results from four top speech-to-text APIs - see for yourself how we compare on cost, latency, and accuracy over alternatives like Deepgram and AssemblyAI.
Frequently Asked Questions
What is Velma Transcribe?
Velma Transcribe is Modulate’s speech-to-text transcription API for batch transcription and real-time streaming transcription.
How accurate is Velma Transcribe?
Velma Transcribe is the most accurate solution for real-world conversations. The gold standard for this assessment is the AMI Meeting Corpus, on which Velma Transcribe scores an industry-leading 14.9% WER (word error rate).
Is Velma Transcribe real-time?
Yes. Velma Transcribe supports sub-second streaming transcription and returns partial transcripts as audio is processed.
Does Velma Transcribe include timestamps?
Yes. Velma Transcribe includes transcript timestamps in its output.
What languages does Velma Transcribe support?
Velma Transcribe supports global coverage with over 50 languages covered. The full list is: Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
How much does Velma Transcribe cost?
Velma offers usage-based pricing at a 10x improved rate compared to the competition, starting at $0.025/hour. For more information, see our Pricing page.
Is Modulate ISO 27001 certified?
Yes. Modulate maintains ISO 27001 certification as part of its organization-wide security program.











