Modulate’s Model Benchmarks
Compare audio-native Velma to LLMs
Conversation Understanding Benchmark —
Accuracy vs. Cost
Evaluates a model's ability to identify conversation types, topics, speaker roles and key behaviors.
Highest accuracy lowest cost
Inference cost
Accuracy score
velma-2-fast
velma-2
grok-4.1-fast-non-reasoning
grok-4.1-fast-reasoning
gemini-2-flash-lite
deepseek-v3.1
gemini-2-flash
deepseek-v3.2
gemini-3-flash-min
deepseek-r1
gemini-3-flash-med
gemini-2.5-pro
gemini-3-pro
grok-3
nova-3-intelligence
scribe-v2
grok-4-heavy
gpt-5-mini
gpt-5.2-pro
gpt-5.2
1
2
3
4
5
6
7
8
9
10
$0.01
0.02
0.03
0.04
0.05
0.06
0.07
$0.08
$0.10
0.50
1.00
$1.50
0
$0.04
4.63
Compare Transcribe
to the competition
Transcription Benchmark (Accuracy vs. Price)
Average Word Error Rate (WER) across Earnings-22 and VoxPopuli datasets
Lowest WER lowest cost
Cost per hour
Avg. Word Error Rate
modulate-transcribe
scribe-v2
assemblyai-universal-2
assemblyai-universal-3-pro
speechmatics-enhanced
google-gemini-2.5-pro
gpt-4o-transcribe
google-chirp-2
deepgram-nova-3
openai-whisper-large-v3
8
9
10
11
12
13 %
$0.00
0.10
0.20
0.30
$0.40
$0.15
9.35 %
Speech-to-Text Transcription Pricing (Batch)
Modulate
$0.03 / hr
xAI
grok-stt
$0.10 / hr
AssemblyAI
universal-3 Pro
$0.21 / hr
ElevenLabs
scribe v2
$0.22 / hr
Speechmatics
enhanced
$0.24 / hr
Deepgram
nova-3
$0.31 / hr
OpenAI
gpt-4o-transcribe
$0.36 / hr
Speech-to-Text Transcription Pricing (Streaming)
Modulate
$0.06 / hr
xAI
grok
$0.20 / hr
Speechmatics
enhanced
$0.24 / hr
Deepgram
nova-3
$0.35 / hr
OpenAI
gpt-4o-transcribe
$0.36 / hr
ElevenLabs
scribe-v2
$0.39 / hr
AssemblyAI
universal-3-pro
$0.45 / hr
Hugging Face’s Deepfake Speech Leaderboard
Modulate is the top ranked deepfake detection model on Hugging Face's Speak Deepfake Arena , the leading independent benchmark. View it here.
Compare Deepfake
Detect to the competition
Modulate is #1 on 🤗 Hugging Face
Modulate is the top ranked deepfake detection model on Hugging Face's Speech Arena Leaderboard, the leading independent benchmark. Just 1.1% Equal Error Rate, Modulate catches 133% more deepfakes than the next best.
| System | Date Added | Num Params (M) | Pooled EER | Average EER ↓ |
|---|---|---|---|---|
| 🥇Modulate-VELMA-2-Syntheti | ||||
| 🥇Modulate-VELMA-2-Syntheti | 11/03/2026 | 316.000 | 1.586 | 1.104 |
| 🥈Resemble-Detect-3B-Omni | ||||
| 🥈Resemble-Detect-3B-Omni | 14/10/2025 | 3000.000 | 2.099 | 2.570 |
| 🥉Hiya-Authenticity-Verific | ||||
| 🥉Hiya-Authenticity-Verific | 13/02/2026 | 1000.000 | 2.324 | 2.113 |
| DLMSL-SpeakSure-v0.1 | ||||
| DLMSL-SpeakSure-v0.1 | 27/10/2025 | 658.630 | 6.142 | 3.954 |
| Whispeak | ||||
| Whispeak | 20/08/2025 | 98.900 | 8.060 | 3.049 |
EER (Equal Error Rate) is the foundation performance metric used to evaluate how accurately a model can distinguish between genuine human speech and AI-generated audio.
Modulate Catches 99% of all Deepfakes
Catch 2x more deepfakes and flag 48% fewer false positives vs. next-best. 🤗 Hugging Face Leaderboard.
Accuracy
92
94
96
98
100%
98.9%
Modulate
velma-deepfake-detect
97.9%
Hiya
authenticity-verific
97.4%
Resemble AI
resemble-detect-3b
96.9%
Whispeak
whispeak
96.0%
Deep Learning
dlmsl-speaksure-v0.1
94.2%
DF Arena
df-arena-500m-v1
94.1%
DF Arena
df-arena-1b-v1
93.9%
Syntra
syntra-detector
92.9%
Momenta
momenta
Detect Deepfakes for just $0.25 / hr
Fraud protection at scale, at a price that levels the playing field vs. scammers.
Modulate Deepfake-Detect
$0.25 / hr
Resemble AI Enterprise
$29 / hr
Other Providers
$30 — $120 / hr
Resemble AI Self-Serve
$144 / hr