Real-Time Audio PHI & PII Redaction: How Velma Catches More Sensitive Data Than Any Other AI

For a long time, audio has been difficult for machines to understand. But as companies like Modulate have made it easier and easier for digital systems to listen, we've encountered a new problem. There's some things mentioned in that audio that we never were meant to hear.
The credit card number a customer read out in 2022. The diagnosis someone mentioned while disputing an insurance claim. The salary figure that slipped out in an HR call. The one-time passcode, the routing number, the Social Security number. Every sensitive detail that has ever been spoken over your infrastructure, sitting in storage, intact, waiting.
Good tools exist to prevent this issue in text. There are plenty of options for scanning documents, databases, emails, etc. But voice got a watered-down version: transcribe first, run a text PII filter on the resulting transcript, hope for the best. The audio itself? Usually untouched.
I've been bothered by this for a long time. Today, we're doing something about it.
Detect, Redact, Protect
We're launching the Velma PII/PHI Redaction model with two modes, and which one you need depends on what you're actually trying to accomplish.
If you're already running Velma Transcribe, you can enable PII/PHI detection as an add-on. Velma will flag sensitive entities within the transcript in real time, labeling what was said, where, and what category it falls into, without altering the content. You get the full picture, clearly marked, and you decide what to do downstream.
The second mode is the one I'm more excited to talk about. The new Velma PII/PHI Redaction models don't just annotate. They do the actual removal in real time...and in audio, not just text. You get back a cleaned transcript *and* a redacted audio stream, live, as the conversation happens.
There are few others who have truly grappled with redacting audio directly. So let's unpack how and why we built this.
Everyone else is solving the wrong problem
Most PII tools treat voice as a text problem. The workflow: convert speech to text, run an NLP pipeline over the transcript. This works fine if your goal is a cleaner written record. But it leaves you with a redacted document and an unredacted recording — and in regulated environments, that's often not good enough. The audio is what gets stored, transferred, potentially breached. So the audio is what needs to be redacted.
Real audio redaction means you can't bolt a text filter onto the back end of a transcription system. You need to understand the spoken content and act on it *in the stream*, before it ever reaches storage. Most PII vendors aren't set up to do this because they started from text. They operate after the fact, on the transcript.
We could build this because of how Velma was architected from the beginning. We've never treated voice as a transcription problem; we've spent years building models that understand audio directly, not as a stepping stone to text. The same architecture that lets Velma pick up emotion, prosody, and speaker dynamics is what lets us detect and redact sensitive content in the stream.
Let's cover *all* sensitive information properly
When we started building this, we made a deliberate choice: ask "what sensitive information actually gets disclosed in real conversations?" rather than "what does the compliance checklist say?"
Those turn out to be pretty different questions.
The standard market answer covers names, contact info, SSNs, credit card numbers, health insurance IDs, and a handful of national ID formats. Useful, but it's essentially a checklist of things that are easy to detect with pattern matching. Structured data. Fixed formats.
Real conversations don't stay in that lane. Consider what we detect that most of the field doesn't:
**Financial depth.** Account balances. Transaction history. Credit scores. Bankruptcy records. Investment account details. Cryptocurrency wallet addresses. When someone in a collections call or a wealth management interaction starts explaining their actual financial situation — not just reading off a card number — we catch it. No other commercial tool does this consistently.
**Health data that actually covers PHI.** Most competitors flag your member ID. We flag prescriptions, diagnoses, treatment plans, mental health records, lab results, disability status, vaccination history, substance use, and genetic information. The difference matters enormously for any product operating in or near healthcare.
**Employment specifics.** Salary. Bonuses. Equity grants. Performance reviews. Immigration status. Manager names. These leak constantly in HR calls, recruiting conversations, and benefits support, and they carry real legal exposure when they do.
**Security questions.** This one surprises people. One-time passcodes and PINs get some coverage across the market. But the actual security question answers — mother's maiden name, name of your first pet, high school, childhood best friend — are almost entirely undetected by existing tools, despite being the exact vectors social engineering attacks are built around. We cover all of them.
**Insurance.** Policy numbers, claim details, beneficiary information. None of our commercial competitors detect these. It's one of those gaps that seems obvious in retrospect.
Total coverage: 94 entity types, versus 62 for the next closest competitor, and 30-47 for most of the field.
To be clear, while it's exciting to see the numbers, the point here isn't breadth for breadth's sake. The reason the coverage matters here is more specific: sensitive information doesn't follow format rules. A diagnosis doesn't announce itself with 16 digits. A salary disclosure doesn't come with a checksum. Detecting these things requires understanding what someone is *communicating* in context, not just matching patterns in what they're *saying*. Building for the harder cases forced us to build something that actually understands conversation, and that makes us more accurate across the board.
What makes Velma difference
Detecting sensitive information is in some sense just a transcription problem. For structured data — card numbers, SSNs, IBANs, routing numbers — we use pattern recognition like everyone else, and it works just fine.
But as mentioned above, not all sensitive info comes in a predictable format. So the real difference with Velma is in the semantic layer, where we're identifying disclosures contextually. We've trained and tuned this extensively on real enterprise voice data: noisy environments, diverse accents, overlapping speech, the specific ways sensitive information surfaces in authentic phone conversations rather than being read aloud from a form. It performs meaningfully better in the messy, contextual conditions that real contact center audio actually presents.
Who should care
The obvious candidates are regulated industries: financial services (PCI-DSS, GLBA), healthcare and health-adjacent products (HIPAA), HR technology, and contact centers doing account verification. If that's you, the compliance case makes itself.
But protecting sensitive info isn't just a matter of compliance. The companies I find most interesting to work with are the ones who've already sat with their own voice data, understood what's in it, and decided they want to do something about it proactively rather than waiting for a regulation to force their hand. Those companies who appreciate the value of this redaction to protect both themselves and their customers are the ones who will get the most value out of our unique offering.
If you want to understand what your recordings actually contain, we're happy to run a pilot against your real audio and show you. The data usually makes a stronger case than I can.

.png)

