Velma Is Now Available via API

June 3, 2026
Mike Pappas
(HE/HIM/HIS)

When we launched Velma as part of our enterprise platform, the response confirmed what we'd believed for a while: enterprises need AI that properly understands voice. Not transcripts passed off to a text-based model, but something built from the audio signal up.

Today, we're making Velma available directly through the Modulate API. That means anyone can start building with it - no enterprise contract required.

The problem Velma solves

Think about what you actually need to know from a voice conversation. Is the caller about to churn? Is your agent going off-script in a way that creates liability? Is this person who claims to be a longtime customer showing vocal patterns consistent with fraud? 

Most "voice AI" in production right now is just transcribing audio and passing the resulting text to an LLM. It works well enough for a lot of things. But for the business examples given above, none of those answers live cleanly in a transcript alone. The words might look fine. The audio tells a different story. 

With Velma, you describe what you care about in plain language, just like with an LLM - but unlike LLMs, we’ll then look for it through an ensemble of over 100 voice-native models, analyzing every dimension of the voice conversation, ensuring we catch the stuff transcripts and LLMs will miss. And despite using so many models, each is hyper-optimized, so Velma costs the same or less than running transcripts through an LLM, while producing much better accuracy.

See the full meaning of each conversation

Velma surfaces prioritized detections of the behaviors you care about - and backstops its claims with structured information including emotions, speaker dynamics, topic categories, summaries, and more. If you’re sick of prompt engineering, Velma also includes 150+ pre-designed prompts for common business needs like fraud indicators, churn risk, compliance violations, and escalation patterns. 

But the part worth dwelling on is how it does this, because it's genuinely different from what most teams are building today.

What you can build with it

The Velma API is designed to drop into an existing voice stack. You send audio, you get back structured JSON. From there, the use cases are pretty wide: smarter routing and response logic for voice agents, real-time coaching tools that react to how a call is going rather than just what's being said, fraud detection that picks up on audio cues a human might miss, emotion-driven personalization, compliance monitoring that doesn't rely on keyword matching alone.

We've seen enterprise customers do genuinely creative things with Velma's platform, but there's a ceiling when access is gated behind a sales conversation. Opening it up via API is about removing that ceiling.

Try it

Free API access is available now. You can grab a key, hit the playground, and see Velma understand a real conversation - not a demo we curated, an actual call you bring to it.

If you've been frustrated with what transcript-based approaches can and can't do, I think you'll find this interesting. I look forward to seeing what you can build with truly audio-native understanding!