.png)
Most transcription APIs are trained on clean, structured recordings. Modulate is trained on 500M+ hours of real conversations — noisy calls, crosstalk, accents, and all.
Batch and real-time streaming on the same API key. Capabilities that usually require separate models are built in — no extra calls, no extra cost.
$ curl -X POST \ https://api.modulate.ai/v1/transcribe \ -H "Authorization: Bearer $API_KEY" \ -F "audio=@call.wav" \ -F "model=modulate-transcribe" \ -F "diarize=true" \ -F "emotion=true"
{ "transcript": "Thanks for calling, how can I help—", "speakers": 2, "emotion": "neutral", "accent": "southern_us", "duration_ms": 4210, "cost_usd": 0.000035 }
Designed to fit your stack today and grow with your usage tomorrow.
WER vs. cost/hr across Earnings-22 and VoxPopuli. Lower-left is better.
No contracts, no volume minimums. Pay only for what you process.
Start transcribing in under 5 minutes. Full docs included.
No commitment. No sales call. Scales to hundreds of hours.