cfwf default

Using Deepgram? Pay 88% less for transcription with higher accuracy

Modulate's Transcription API is the #1 most accurate model across 3 transcription benchmarks, at 88% at a fraction of the cost — proven by independent benchmarks.

Modulate’s transcription model offers many benefits including:

88% less expensive ($0.03 / hr vs. $0.25 / hr)
Best-in-class accuracy on real-world speech (AMI Corpus and IHM)
Detects 20+ emotions and 20+ accents
Supports 70+ languages, PII redaction and more

Get started with 400 hours, free.

Transcription Benchmark (Accuracy vs. Price)
Average Word Error Rate (WER) across Earnings-22 and VoxPopuli datasets
Lowest WER, lowest cost
Cost per 1000 minutes of audio
Avg. Word Error Rate
modulate-velma-2
scribe-v2
gemini-2.5-pro
universal
speechmatics-enhanced
solaria-1
gpt-4o-transcribe
chirp-2
speechmatics-standard
whisper-large-v3
nova-3
8
9
10
11
12 %
1
2
3
4
5
6
7
8
$9
0
Transcription Accuracy Benchmark
Average Word Error Rate (WER) across Earnings-22 and VoxPopuli datasets
Avg. Word Error Rate
6
7
8
9
10
11
12
13 %
modulate-velma-2
scribe-v2
gemini-2.5-pro
universal
speechmatics-enhanced
solaria-1
gpt-4o-transcribe
chirp-2
speechmatics-standard
whisper-large-v3
nova-3
Modulate
ElevenLabs
Google
AssemblyAI
Speechmatics
Gladia
OpenAI
Deepgram
Transcription API Cost Comparison among STT Leaders
Cost per audio hour transcribed
AssemblyAI
$0.15
universal
Deepgram
$0.26
nova-2
ElevenLabs
$0.40
scribe-v2
Modulate
$0.03
modulate-velma-2

10x Lower Cost Than the Competition

Explore Cost Comparison Tool

#1 Accuracy in independent
benchmarks

10x lower cost than leading competitors

Built for real production
workloads

Enterprise-ready performance

Why teams are upgrading from Deepgram

#1 Accuracy in Independent Benchmarks

Modulate consistently outperforms
Deepgram across conversational
speech, accents, and noisy
environments.

Lower Cost. Better Results.

Get better transcripts while spending significantly less per 1,000 minutes than Deepgram. Teams switching from Deepgram save up to 90% on transcription costs — without sacrificing accuracy. In fact, they get more of it.

Built for Real Systems, Not Demos

Designed for scale, reliability, and
real-world audio — not just clean
test samples.

Modulate vs. Deepgram

A side-by-side comparison for teams evaluating transcription providers

Feature
Cost
$0.03 / hour
$0.25 / hour
Real-World Accuracy
14.9% WER on AMI Corpus
28.1% WER on AMI Corpus
Accuracy on Earnings-22
7.8% WER
15.7% WER
Emotion Detection
20+ emotions
None
Accent detection
20+ accents
None
Language Support
70 languages
50+ languages
Overlapping speakers
Handles naturally
Underperforms in complex multi-speaker audio
Training Data
500M+ hours of conversations
Primarily curated / structured datasets
Streaming Support
Real-time streaming
Real-time streaming
PII / PHI redaction
Yes
Yes
Diarization
Yes
Yes

Drop-In API. No Friction.

Integrate in minutes, not weeks.

Simple REST API

Clean documentation

Works with your existing stack

Built for real-time and batch transcription

Transcription Is Just the Beginning

Teams that start with transcription often expand into moderation, safety, and real-time voice intelligence.

Emotion detection
Deepfake detection
Accent detection
Full conversation intelligence

Start with transcription. Be ready for what's next.