Modulate Pricing

Trained on over half a billion hours of real conversations, Modulate's Velma analyzes voice natively to deliver fast, explainable insights without fine-tuning or LLM overhead.

Talk to Sales

Enterprise Platform

Starting at $60,000

All tiers include access to the Modulate platform for uploading audio, configuring insights, integrating into your call stack, and reviewing conversation analytics—plus an annual bundle of usage credits.

Most popular

Starter

Growth

Professional

Business

Enterprise

Custom

Included Calls

Variable based on call length and analysis requirements

250K - 1M

500k – 1M

1M – 4M

2M – 7M

3M – 13M

13M+

Support SLA

Business Days

Slack Channel

–

Add-on

Included

Customer Success

Add-on

Included

Best For

Pilots & early deployments

Scaling teams

Production rollouts

Multi-team deployments

Large-scale deployments

Highest volume & custom needs

Starter

$60,000

Choose

Included Calls

Variable based on call length and analysis requirements

250K - 1M

Support SLA

Business Days

Slack Channel

–

Customer Success

Add-on

Best For

Pilots & early deployments

Growth

$120,000

Choose

Included Calls

Variable based on call length and analysis requirements

500k – 1M

Support SLA

Business Days

Slack Channel

–

Customer Success

Add-on

Best For

Scaling teams

Professional

$240,000

Choose

Included Calls

Variable based on call length and analysis requirements

1M - 4M

Support SLA

Business Days

Slack Channel

–

Customer Success

Included

Best For

Production rollouts

Most popular

Business

$360,000

Choose

Included Calls

Variable based on call length and analysis requirements

2M - 7M

Support SLA

Business Days

Slack Channel

Add-on

Customer Success

Included

Best For

Multi-team deployments

Enterprise

$600,000

Choose

Included Calls

Variable based on call length and analysis requirements

3M - 13M

Support SLA

Business Days

Slack Channel

Add-on

Customer Success

Included

Best For

Large-scale deployments

Custom

Talk to Sales

Included Calls

Variable based on call length and analysis requirements

13M+

Support SLA

Business Days

Slack Channel

Included

Customer Success

Included

Best For

Highest volume & custom needs

Included in every Platform plan

Get in touch

Platform access

Upload audio, review analytics, and manage insights in one place.

Workflow flexibility

Mix and match use cases across teams and channels.

Integrations-ready

Connect to your call stack and operational tools.

Explainable outputs

Time-aligned signals tied to moments in audio.

Built for scale

Support real-time and post-interaction analysis.

Team controls

Manage access, workflows, and usage visibility

Build With Velma

The Velma API offers industry-leading models for transcription, emotion detection, deepfake detection, and more, at a price designed for scale.

Most popular

Pay as you go

Custom API Access

Talk to sales

World-class transcription starting at
‍3¢ / hour

Cutting-edge models at enterprise scale

No minimums, no commitments

Priority access to new endpoints and models

API credits do not expire

Bulk usage discounts

Up to 400 hours in free credits

Dedicated support options available

Most popular

Pay as you go

World-class transcription starting at
‍3¢ / hour

No minimums, no commitments

API credits do not expire

Up to 400 hours in free credits

Custom API access

Talk to Sales

Cutting-edge models at enterprise scale

Priority access to new endpoints and models

Bulk usage discounts

Dedicated support options available

Speech to Text Pricing Breakdown

Model

Included Features

Credit Rate

Dollar Rate

Velma Transcribe Batch

Using Dynamic Ensemble Block technology, leverage multiple models seamlessly to get top-quality results.

Diarization, Emotion Detection, Accent Identification, PII/PHI Tagging, Multilingual Support

3/hr

$0.03/hr

Velma Transcribe Streaming

Using Dynamic Ensemble Block technology, leverage multiple models seamlessly to get top-quality results.

Diarization, Emotion Detection, Accent Identification, PII/PHI Tagging, Multilingual Support

6/hr

$0.06/hr

Velma Transcribe Batch English Fast

Directly access our fastest submodel for lightning-quick results.

Realtime Factor of up to 200x, English-Only

2.5/hr

$0.025/hr

See How We Compare

Feature

Modulate

Deepgram

AssemblyAI

ElevenLabs

Real-World Accuracy

Leading accuracy

88% more errors

66% more errors

76% more errors

Speaker Diarization

Included

$0.12/hr

$0.02/hr (batch) or $0.12/hr (streaming)

Unavailable

Emotion Detection

Included

Pay extra for text sentiment only

$0.02/hr

Unavailable

Accent Identification

Included

Unavailable

Multilingual Support

57 languages

51 languages

99 languages

90 languages

PII / PHI Tagging

Included

$0.12/hr

$0.08/hr

Unavailable

Credits Included

Up to 400 hrs

Up to 360 hrs

Up to 300 hrs

2.5 hrs

Deepfake Detection Pricing Breakdown

Model

Included Features

Credit Rate

Dollar Rate

Velma Deepfake Detect Batch

Use the world’s leading synthetic voice detector to prevent fraud or deception.

Segment-based probability scores, every four seconds with two second overlap. Only 3 seconds minimum required.

25/hr

$0.25/hr

See How We Compare

Feature

Modulate

Resemble AI

Reality Defender

Sensity AI

Top Model

Modulate-VELMA-2-Synthetic

Detect-3B Omni

Real Suite

Sensity Deepfake Detection Hub

Pricing

$0.25/hr

$144/hr, with max discounts down to $28/hr

Not listed

Self-Serve API

Yes

No, talk to sales

Accuracy (higher is better)

98.9% (#1 on the primary leaderboard)

97.4%

Unavailable

95-98%

Equal Error Rate (lower is better)

1.1% (#1 on the primary leaderboard)

2.57%

Unavailable

Audio Required for Result

3 seconds

Not listed

6 seconds

30 seconds

Optimized for…

• Enterprise fraud prevention & financial security
• Contact center routing and authentication
• Media/journalism disinformation detection
• Adversarial deepfakes generated by expert actors

• Enterprise fraud prevention & financial security
• Multimodal (audio + video + image) threat detection

• Video conferencing user verification
• Brand & executive protection
• Media/journalism disinformation detection

• Law enforcement & judicial/forensic investigations
• Government & defense intelligence

Additional Models Available

Transcription, Emotion, PII Redaction, Conversation Analytics

Speaker Identification + Voice Biometrics

Video & Image Deepfake Detection

Court-admissible forensic reports with chain-of-custody logs and NIST/ENFSI-aligned documentation

Free Credits

Up to 40 hrs

None

50 audio scans per month

None (must contact sales)

FAQ

What is a credit?

Credits are a simple unit that maps to audio processing usage across workflows and API capabilities. Different workflows consume credits at different rates.

Can I mix workflows across teams and use cases?

Yes. Platform plans are designed to support multiple workflows so you can applyvoice intelligence where it matters most.

Do you offer volume discounts?

Yes. Larger platform tiers include bigger credit bundles, and API pricing supportsvolume-based discounts. Talk to Sales to learn more.

Can we start small and scale?

Yes. Most customers start by paying as they go and expand as usage grows. Upgrading to a paid tier with a volume discount is quick and painless.

How do you integrate with our stack?

Use Modulateʼs enterprise platform and integrations, or access Velma programmatically through the API.

Get pricing tailored to your conversation volume

Talk with our team to estimate usage, compare platform tiers, and design a plan that fits your workflows and scale.

Cookie consent notice

Preferences Dashboard

Modulate Pricing

Enterprise Platform

Starting at $60,000

Included in every Platform plan

Build With Velma

Speech to Text Pricing Breakdown

See How We Compare

Deepfake Detection Pricing Breakdown

See How We Compare

FAQ

Get pricing tailored to your conversation volume