Modulate · Transcription

Speech-to Text API for Real-World Audio.

Most transcription APIs are trained on clean, structured recordings. Modulate is trained on 500M+ hours of real conversations — noisy calls, crosstalk, accents, and all.

"This transcription model might be the best stuff I've seen. I've never seen another model with realtime diarization (that works?!)"

Nick Leonard

CEO, VoiceRun

Free tier · no card needed

Get immediate API access

400 hours free. Start transcribing in minutes.

400 hours free · $0.03/hr after

one API

Everything you need.
One API.

Batch and real-time streaming on the same API key. Capabilities that usually require separate models are built in — no extra calls, no extra cost.

✓

Real-time diarization — speaker-separated transcripts out of the box

✓

70+ languages and dialects supported

✓

Emotion and accent detection — 20+ emotions, 20+ accents, direct from audio

✓

PII/PHI redaction included — no add-on or post-processing step

✓

REST batch and WebSocket streaming — same key, same endpoint pattern

Explore the docs →

bash

$ curl -X POST \
  https://api.modulate.ai/v1/transcribe \
  -H "Authorization: Bearer $API_KEY" \
  -F "audio=@call.wav" \
  -F "model=modulate-transcribe" \
  -F "diarize=true" \
  -F "emotion=true"

200 OKJSON response312ms

{
  "transcript": "Thanks for calling, how can I help—",
  "speakers": 2,
  "emotion": "neutral",
  "accent": "southern_us",
  "duration_ms": 4210,
  "cost_usd": 0.000035
}

built for production

Flexible. Private. Scalable.

Designed to fit your stack today and grow with your usage tomorrow.

Flexibility

REST and WebSocket on one keyBatch and real-time streaming share the same API key and endpoint pattern.

Enrichments are per-request flagsDiarization, emotion, accent, PII tagging — all optional. Pay only for what you enable.

Speed-cost tradeoffs built inEnglish Fast at $0.025/hr for high-throughput pipelines; Multilingual at $0.03/hr for full feature support.

SDKs in Python and Node.jsWith async and concurrent patterns documented for production workloads.

Privacy

Audio deleted after 35 daysPermanently and automatically — no manual cleanup required.

No data sold. Ever.Audio processed via the API is used solely to deliver the service.

Enterprise opt-out from model trainingAnnual commitment customers can opt out entirely.

DPA availableGDPR, UK GDPR, and Standard Contractual Clauses for EU/UK data transfers.

Scale

Limits raiseable on requestConcurrency and monthly caps can be increased for high-volume workloads.

Real-time usage dashboardMonitor audio hours and concurrent connections against your limits at any time.

Built for batch pipelinesSemaphore and connection pool patterns documented for processing large file backlogs concurrently.

100 MB per batch requestWith automatic silence trimming and format normalization.

independent benchmarks

Where accuracy meets cost.

WER vs. cost/hr across Earnings-22 and VoxPopuli. Lower-left is better.

Modulate transcription pricing

Transparent. On-demand. No lock-in.

No contracts, no volume minimums. Pay only for what you process.

Batch · REST APIUSD / hour processed

Modulate lowest cost

modulate-transcribe

$0.03

xAI

grok-stt

$0.10

AssemblyAI

universal-3 Pro

$0.21

ElevenLabs

scribe v2

$0.22

Deepgram

nova-3

$0.31

OpenAI

gpt-4o-transcribe

$0.36

Streaming · WebSocketUSD / hour live audio

Modulate lowest cost

modulate-transcribe

$0.06

xAI

grok

$0.20

Speechmatics

enhanced

$0.24

Deepgram

nova-3

$0.35

OpenAI

gpt-4o-transcribe

$0.36

AssemblyAI

universal-3-pro

$0.45

get started free

400 free hours.
No credit card required.

Start transcribing in under 5 minutes. Full docs included.

Get API Key Free →Explore the Docs

No commitment. No sales call. Scales to hundreds of hours.