AI Music Detection API

Detect AI generated vocal and instrumental music

Built for platforms, distributors, and rights holders — catch AI-generated tracks before they reach monetization, distribution, or licensing.
How It Works

Clip-level verdicts built from per-window evidence.

Send audio, get back a primary verdict, per-window scores, and confidence values. Batch or real-time streaming — same structured output either way.

Batch API

Send a complete audio file, receive a clip-level primary_verdict plus a per-window breakdown. Supported formats: .aac, .flac, .m4a, .mp3, .mp4, .ogg, .opus, .wav — up to 100 MB.

Streaming API (WebSocket)

Connect over WebSocket and receive per-window vocal AI verdicts as audio arrives. Instrumental AI detection and the final clip-level verdict are delivered in the done message at end of stream.

Structured output

Every response includes primary_verdict, vocal_ai_percentage, vocal_ai_confidence, instrumental_ai_percentage, instrumental_ai_confidence, and a full per-window breakdown.

Tunable thresholds

Adjust the precision/recall tradeoff without retraining. Set vocal and instrumental thresholds independently to match your use case.
AI Music Detection Capabilities

Two detection paths. One API.

Most AI music detectors break down on hybrid tracks, multi-part productions, and anything where AI was only used for part of the composition. Modulate runs two independent models, one for vocals, one for instrumentals, scoring each 4-second window separately, so you get a confident result grounded in evidence.

Vocal detection

Modulate identifies AI-generated singing, rap, or any vocal performance within a 4-second window — including AI vocals over organic human music. Each window is scored independently, returning vocal_ai_percentage and vocal_ai_confidence.

Instrumental detection

Modulate identifies AI-generated instrumental content when no sufficient vocal content is present — including fully AI-generated tracks with no vocals at all. Results are delivered per-window and aggregated into the clip-level verdict.
Modulate vs. industry standard AI Music Detection

A side-by-side comparison for teams evaluating AI music detection tools.

Modulate
Industry Standard
Detection paths
Vocal + instrumental, independent
Single combined score per track
Accuracy
Reliably detects only AI-generated music or instrumentals
Frequent false positives on digital alterations including autotune, compression, and other common techniques
Output granularity
Per-4-second window + clip-level verdict
Clip-level only
Speech signals
Per-4s window for vocal AI; instrumental AI at end-of-stream
Single score at end of file
Hybrid content handling
Window-level visibility into mixed tracks
Not supported
Threshold tuning
Yes — no retraining needed
Fixed or not offered
Streaming support
Real-time WebSocket streaming
Batch only
Additional Velma models
Transcription, Deepfake Detection, PII Redaction, Emotion, Accent
Music detection only
Self-serve API
Yes — sign up and start in minutes
Varies; some require enterprise partnership negotiation
Cost
$0.07/hr
Not publicly disclosed or per-track pricing
How modulate can help you address AI Generated Music Detection

It's time to get ahead of the AI music challenges your team is already dealing with.

AI-generated music is evolving faster than manual review can handle. AI Music Detect by Modulate gives you a scalable, self-serve detection layer, so you can enforce policies, protect rights, and stay compliant.

Protect royalty payouts

AI-generated tracks are being uploaded at scale to farm streaming royalties. Modulate flags them at ingestion before they dilute payouts for legitimate human artists.

Reduce false positives

Window-level scoring and tunable thresholds mean you’re not stuck with a fixed cutoff, so legitimate tracks don’t get flagged and real violations don’t slip through.

Enforce platform policies at scale

Spotify, Apple Music, YouTube Music, and most major DSPs require AI disclosure or restrict AI monetization. Manual review at ingestion volume isn't viable. Modulate gives you automated detection that enforces policy without adding headcount.

Know what you're licensing

Purely AI-generated works without meaningful human authorship aren't copyrightable under US Copyright Office guidance. Velma lets licensors, sync agencies, and clearance teams verify content before contracts are issued.

Screen content before it reaches platforms

DSPs are pushing distributors to screen uploads upstream. Velma gives DistroKid, TuneCore, CD Baby equivalents a drop-in detection layer to get ahead of platform requirements — before they become mandates.

Stay ahead of AI disclosure regulations

The EU AI Act requires disclosure of AI-generated content in covered contexts. Velma gives platforms operating in affected jurisdictions a reliable way to identify non-disclosing uploads before they create compliance exposure.
Detecting AI Music with modulate

Clip-level verdicts built from per-window evidence.

1
Send audio
Point Velma AI Music Detect at a file or a live stream. One endpoint — no pipeline to assemble. Supported formats: .aac, .flac, .m4a, .mp3, .mp4, .ogg, .opus, .wav.
2
Get per-window verdicts
Every 4-second window is scored independently for vocal AI and instrumental AI content. Streaming returns vocal results in real time; instrumental results and the clip-level verdict arrive at end of stream.
3
Receive structured output
Get back a primary_verdict of ai-vocal-music, ai-instrumental, or not-ai-music — plus confidence scores and a full per-window breakdown ready to plug into your pipeline.
Streaming, start to finish:

# 1 · open a connection 2 · stream audio 3 · read results
ws = connect("wss://platform.modulate.ai/api/velma-2-ai-music-detection-streaming?api_key=…")
ws.send(audio_chunk) # stream your audio
ws.send("") # signal end of stream
for event in ws: # window verdicts + final primary_verdict
print(event["primary_verdict"])

Start building with Modulate.

Want to see how easy it is to detect AI music? Grab an API key or try the playground to see Velma understand a real conversation.