Your voice data is full of risks you're not hearing.

AI voice intelligence that listens and understands like a human.

Velma listens to audio conversations — surfacing fraud, customer churn, compliance violations, etc. — before they become incidents. Plug it into your audio pipeline to immediately start solving problems. You make the rules. Velma understands your business.

Uncover problems

204 alerts today · 37 critical · 44 escalated
Just now

Potential Customer ChurnChurn RiskConfidence 97%

Renewal customer cited cost as barrier and asked about lower-tier options during retention call.

2 min ago

Unresolved Billing DisputeComplianceConfidence 88%

Customer referenced unresolved billing dispute from prior call; fix never applied to account.

Escalated
12 min ago

Threat-Based HarassmentAgent SafetyConfidence 96%

Caller repeated physical threat toward support agent after refund denial; warning issued, threats continued.

Call Terminated.
62 min ago

Authorization Fraud AttemptDeepfakeConfidence 94%

Deepfake caller pushed agent to bypass verification, with requesting a wire of $47,500 on Account #8821.

Call Terminated.
60 min ago

Unauthorized Data DisclosureComplianceConfidence 92%

Agent confirmed billing address and payment method before identity verification was complete.

Review
52 min ago

Identity Verification FraudFraudConfidence 91%

DOB, address, and PIN mismatched across three verification prompts.

Manual Verification

Velma

An audio-native understanding engine for enterprises and developers.

563124894

Hours of conversations improved

40M+

Unique business risks detected

#1

Accuracy in Conversation Understanding
"By using Modulate to identify disruptive behavior in real time, we're building a foundation that supports enforcement while sustaining the fun that defines great multiplayer experiences."
Natasha Tatarchuk
SVP & CTO,
Activision
"This voice model might be the best stuff I've seen. I've never seen another model with realtime diarization (that works?!)"
Nick Leonard
THE MODULATE DIFFERENCE

Words are just the surface.
Velma hears the full picture.

Words are just the surface. Velma hears the full picture.

Word-based transcription discards the true meaning of a conversation.
Velma leverages acoustic signals to understand conversations like a human.

THE INDUSTRY STANDARD

Transcription + LLM pipeline
Voice signals discarded
Tone, emotion, hesitation, stress, speaker dynamics, intent, sarcasm and many more
WHAT TRANSCRIPTION CAPTURES
1 layer
Words
Captured
The literal transcript — what was said
Intent and behavior
Lost
Misunderstands intent and vulnerability
Tone and emotion
Lost
Loses anger, frustration, fear, joy, sarcasm
Prosody
Lost
Ignores pauses or unique delivery
Speaker dynamics
Lost
Overlooks interruptions and side comments
Deception and stress cues
Lost
Misses hesitation and vocal anxiety
Acoustic authenticity
Lost
Cannot catch deepfakes or spoofing

VELMA BY MODULATE

Voice-native AI
Voice signals analyzed
Tone, emotion, intent, rhythm, context, accents, deepfakes, sarcasm, vocal biomarkers and more.
WHAT VELMA CAPTURES
7 layers
Words
Captured
Best-in-class transcription accuracy
Intent and behavior
Captured
Any behavior detectable in real time
Tone and emotion
Captured
20+ emotions from the acoustic signal
Prosody
Captured
Pitch, rhythm, emphasis, pacing
Speaker dynamics
Captured
Multi-speaker diarization and patterns
Deception and stress cues
Captured
Vocal stress, lying, coercion signals
Acoustic authenticity
Captured
#1 deepfake detection on Hugging Face

Velma allows you to listen and understand like a human.

VELMA USE CASES

Enterprise teams are reducing voice risks with Velma

Fraud Detection & Prevention

Detect deepfakes, social engineering, impersonation, and vishing in real time across contact centers and fintech.

AI Agent Guardrails

Monitor human and AI agents alike. Evaluate behavior, flag risky interactions, and maintain trust at scale.

Trust & Safety

Proactive voice moderation for harassment, hate speech, grooming, and toxic behavior in live social, gaming or environments.

Customer Retention

Detect at-risk customers, spot dissatisfaction signals, intervene before they leave.

Human Agent Welfare

Protect agents from abuse, detect burnout signals, and identify coaching opportunities.

Compliance & Risk Monitoring

Always-on detection for policy violations, inappropriate content, and compliance risk across voice channels.
And dozens more...

Velma ships with 150+ key behaviors detected instantly

Vishing
Account Impersonation
Return Fraud Attempt
Feigned Ignorance
Deepfake Detection
Bargaining Manipulation
Coercion Manipulation
AI Agent Manipulation
Identity Spoofing
Synthetic Voice Attack
Social Engineering
Caller ID Spoofing
Credential Harvesting
Payment Diversion
Insurance Claim Fabrication
Warranty Fraud
Voice Cloning
Account Takeover
Unauthorized Transfer
Phishing via Voice
Inappropriate Speech
Inappropriate AI Content
Off-topic Discussion
Unaddressed Question
Unclear Speech
Issue Not Resolved
Issue Resolved
Action Plan Created
Script Adherence
Tone Mismatch
Empathy Failure
Knowledge Gap
Hold Time Violations
Missed Verification Steps
Escalation Needed
First Call Resolution
Hallucination Detected
Unauthorized Commitment
Safety Boundary Breach
Threat-based Harassment
Sexual Harassment
Harassment
Child Safety Violation
Hate
Suicidal Ideation
Self-Harm Glorification
Violent Graphic Material
Sexually Graphic Material
Grooming
Stalking Behavior
Doxxing Threats
Radicalization
Intimidation
Domestic Abuse Indicators
Extortion
Blackmail
Cyberbullying
Impersonation for Harm
Predatory Behavior
Complaints
Service Churn
Customer Gratitude
Cancelled Order
Refund or Credit Issued
Personal Vulnerability
Escalation Request
Repeat Caller
Billing Dispute
Account Closure
NPS Detractor Signal
Win-back Opportunity
Price Sensitivity
Competitive Mention
Loyalty Signal
Upgrade Interest
Downgrade Intent
Unresolved Follow-up
Service Recovery Moment
Renewal Risk
Contract Expiration Warning
Rapport Building
Social Connection
Encouragement
Teaching/Mentorship
Social Etiquette
Boundary Setting
Storytelling
Future Planning
Active Listening
Empathy Expression
Objection Handling
Closing Signal
Buying Intent
Budget Discussion
Decision Maker Identified
Next Steps Agreed
Discovery Questions
Upsell Opportunity
Agent Fatigue Detected
Abusive Caller Flagged
Regulatory Disclosure Missing
Recording Consent Violation
PCI Data Exposure
HIPAA Violation
Unauthorized Promise
Misleading Claims
Script Deviation
Prohibited Language
Data Privacy Breach
Policy Violation
Consent Not Obtained
Unapproved Fee Waiver
Incomplete Documentation
Audit Trail Gap
Supervisory Escalation Missed
Call Recording Tampering
Unauthorized Account Access
And many more, spanning dozens of industries and use cases

Every business is different.
Infinite behavior customizations.

Every business is different. Infinite behavior customizations.

Describe what matters. Velma uses every audio signal — not just words — to surface it accurately.
Detect when an a
Saved: Post-Commitment Pushback Risk

You can also upload SOPs, compliance docs, or playbooks to specify exactly what Velma should catch.

Explore Velma

FOR ENTERPRISES

Uncover risks and solve problems

Use Velma to detect and prevent problems specific to your use cases, product and users. All customizable from natural language prompts or document uploads.

Request platform demo
FOR DEVELOPERS

Build using Modulate's APIs

Access voice intelligence APIs including Velma, Deepfake Detection and more. Use natural language to describe the problems you're trying to solve.

Explore the APIs

Velma is the #1 model
for Conversation Understanding

Conversation Understanding Benchmark — Accuracy vs. Cost
Evaluates a model's ability to identify conversation types, topics, speaker roles and key behaviors.
Highest accuracy lowest cost
Inference cost
Accuracy score
velma-2-fast
velma-2
grok-4.1-fast-non-reasoning
grok-4.1-fast-reasoning
gemini-2-flash-lite
deepseek-v3.1
gemini-2-flash
deepseek-v3.2
gemini-3-flash-min
deepseek-r1
gemini-3-flash-med
gemini-2.5-pro
gemini-3-pro
grok-3
nova-3-intelligence
scribe-v2
grok-4-heavy
gpt-5-mini
gpt-5.2-pro
gpt-5.2
1
2
3
4
5
6
7
8
9
10
$0.01
0.02
0.03
0.04
0.05
0.06
0.07
$0.08
$0.10
0.50
1.00
$1.50
0

Explore Modulate's other
leading voice models

Explore Modulate's other leading voice models

Highest accuracy, lowest cost — designed to drop into your stack.

Speech-to-Text API

Best-in-class accuracy on real-world audio. Starting at $0.03/hr.

Deepfake Detection API

#1 on Hugging Face. Detect synthetic audio with 98.9% accuracy.

PII / PHI Redaction API

Detect and redact sensitive PII and PHI from live voice streams.

Music Detection API

Identify hold music and non-speech segments to keep analysis focused.
ENTERPRISE READY

Built for Enterprise Scale and Compliance

Compatible with key technology partners:
SlackZoomFive9
Microsoft TeamsZendeskGenesysSIP

Follows ISO 27001 security processes and HIPAA-compliant practices. Built to operate within GDPR, CCPA, and EU AI Act requirements so enterprise compliance and security teams say yes on day one.

Fits into your existing stack — monitor conversational AI voice agents without ripping and replacing; route signals into case management, risk engines, and coaching tools via APIs and webhooks

Your data stays yours — Modulate never trains on your conversations; you control how your voice AI audio is used

Transparent and auditable by design — every flag traces to a specific moment in the call, with built-in bias controls for high-risk compliance requirements

Battle-tested at scale — nearly a decade, hundreds of millions of sensitive voice conversations, zero breaches