Deepfake fraud prevention has a fatal flaw. And fraudsters already know it.
.png)
Most major banks, insurance carriers, and contact centers in the world now screen calls for deepfake voices. That's the good news.
Here's the bad news: most of them only run a single check during the early moments of the call.
Sophisticated fraudsters know that once they're past that check, they're home free. So they open calls with a real voice — their own, a colleague's, a quick recording — and switch to the AI clone once they're past the gate. The system flags nothing. The fraud proceeds.
This isn't a theoretical attack vector. It's an active exploit of a fundamental economic constraint: deepfake detection at scale is too expensive for continuous monitoring.
Or at least, it used to be.
The number that explains everything
The standard metric for deepfake detection accuracy is Equal Error Rate — the point at which false positives and false negatives are equally likely. Lower is better.
Modulate just topped the Hugging Face Speech DeepFake Arena leaderboard — the most comprehensive independent benchmark for audio deepfake detection, covering 14 datasets and 18 systems — with an average EER of 1.104%.
To put that in context: the second-place model (Resemble's 3-billion parameter model — nearly 10x our size) scores 2.570%. That means we're catching 60% of the deepfakes that others miss, with half as many false positives - and doing it all on a 10x smaller model.
We can deliver this incredible performance because of our unique experience with noisy voice data - and the half-billion hours of real audio we've built our models on. The "giveaway" for a synthetic voice isn't always in the same place: sometimes it's in vocal tone, detectable in tiny snippets. Sometimes it's in rhythm or pronunciation — patterns that only emerge over longer segments. Instead of betting on one signal, our architecture uses a learned layer weighting mechanism that draws on both, catching synthetic voices however they reveal themselves, at whatever point in the conversation they do.
The fact that we do this with 10x fewer parameters is also especially important because smaller models are cheaper to run. And Modulate's economics are even better than you'd naively expect. We're experts at running voice-processing models at large scale, having cut our teeth managing voice analysis for tens of millions of monthly hours for games like Call of Duty. Our prices aren't 10x lower than the competition - they're 100x lower.
What cheap detection actually unlocks
Our cost structure doesn't just make existing use cases more efficient. It makes a fundamentally different product possible.
Check the whole call. Not just the opening 10 seconds. Not spot checks. Every segment, every speaker, every transition — continuously, in the background, adding zero friction to the call. Every two seconds, get a new score, immediately highlighting when a synthetic voice appears.
The fraudster who opens with a real voice and switches mid-call? Found in an instant.
The multi-party call where a synthetic voice joins late? No problem.
The attack that every expensive solution misses by design becomes the attack you catch by default.
Who this is already happening to
A Hong Kong finance firm wired $25 million after a deepfake CFO appeared on a video call. A crypto vishing operation was paying contractors $20,000 a month to run AI voice scams. The FBI has issued guidance specifically on AI voice impersonation targeting government officials.
This isn't edge-case threat modeling. It's the current operating environment for any organization that conducts business over voice.
The question isn't whether your organization will encounter a deepfake voice attempt. It's whether your detection infrastructure was built to catch what's being deployed today — or to combat fraud techniques from two years ago.
Ideal fraud detection runs silently on every call, catches the switch mid-conversation, and flags intent rather than just voice type. Don't settle for half-baked solutions. Let us show you what #1 on the leaderboard really means in practice.




.png)