ToxMod was born to understand all the nuance of voice. It goes beyond transcription to consider emotion, speech acts, listener responses, and much more.
ToxMod becomes an expert in your game's code of conduct and escalates what matters most with high priority to your team.
All user data is anonymized and protected to ISO 27001 standards. Modulate will not sell or rent your data, ever.
ToxMod ships with a variety of plugins for different combinations of game engine and voice infrastructure. You can integrate in less than a day.
ToxMod provides the reports - your Trust & Safety team decides which action to take for each detected harm.
Review an annotated back-and-forth between all participants to understand what drew ToxMod’s attention and who instigated things.
ToxMod uses sophisticated machine learning to recognize when conversations begin to take a bad turn, at which point it generates automated reports to be analyzed by human moderators.
No. Transcription and key word detection is a part of the puzzle, but ToxMod uses a variety of factors, including emotion and contextual phrase detection, when analyzing a conversation. ToxMod listens not only to what is being said, but also how it is said and how other players respond to it.
Definitely in addition! It’s crucial to maintain player reports as a way for your players to flag to you when they have a problematic experience, and know that you’ll engage materially with them and your community. That said, relying only on those reports means you’ll miss out on a lot of players who need your support, so ToxMod aggregates both player reports and its automated ones to ensure you’re not leaving any victims of hate, harassment or other toxicity to fend solely for themselves.
The sad reality is that more than 90% of harms online don’t get reported by players. Further, the worst harms like child grooming and violent radicalization tend to involve victims who are either unaware that they are in danger, or who lack the wherewithal to defend themselves. And ultimately, relying only on player reports puts the responsibility on the victims to protect themselves, when players and studios alike agree that platforms should be more proactive in preventing these kinds of online harms.
No! While ToxMod processes all of your player audio, so does your voice chat system. What really matters is whether that data is ever seen by a human or used in any automated actions or decisions. And the answer is that ToxMod - just like a player report - will trigger if and only if it detects high risk of toxicity or harm, and only then will send the relevant data to your team of moderators to respond to.
While ToxMod does observe all voice chats across your platform, it can actually perform its initial analysis on-device, meaning that data would not reach our servers unless it was flagged as relevant to a harmful behavior. It should be noted that this is an optional setting, and that some studios may prefer to send all audio to ToxMod's servers, but even in this case, ToxMod immediately analyzes this audio to split out the relevant audio from the irrelevant audio. The irrelevant audio is archived temporarily in case it is later determined to be relevant to a harmful behavior, and ultimately deleted after no more than 30 days. This irrelevant data is never viewed by a human - including ToxMod employees - and is similarly never made accessible to the customer even through automated APIs.
Absolutely not. ToxMod only flags harms that are happening live - it will never try to flag players based only on a guess that they’ll misbehave in the future. ToxMod does track player history - i.e. how many times they’ve committed offenses - in order to help prioritize repeat or escalating offenders, but this only determines how urgently it flags new offenses after the player misbehaves again.
When you first deploy it, ToxMod does not take any immediate action on its own. Instead, it watches for harmful behavior and escalates it promptly to your moderators - typically within 45 seconds from when the offense began. This means your moderators can often mute the offender or take more extreme action if needed; and as ToxMod learns about your ecosystem, you can also begin using its automation features to take these actions directly for the most overt offenses without needing to loop your moderation team in first.
Some players surely will, but the majority are actually quite supportive. Major platforms such as Riot and Sony have announced voice moderation in recent years with the general response being positive, with notes of “it’s about time.” Modulate has also found that providing clarity on how ToxMod works is extremely important. While many players express initial concerns regarding privacy or the risk of false positives, more than half have converted to supporters after a conversation offering additional detail about how ToxMod actually works - especially when it’s clarified that human moderators will make the final decisions based on AI recommendations, rather than AI acting directly.
ToxMod is designed specifically to understand this kind of subtlety. It takes advantage of a wide range of signals like emotions (of both the speaker and respondent), speech behaviors like interruption or awkward silence, and speech modes like laughter or crying, to recognize whether someone has been harmed or it is just hearing coarse language among friends.
This might sound like a grand claim - after all, everyone knows AI doesn’t understand nuance! - but the numbers back us up here. Within 2-3 weeks of a standard deployment, ToxMod can already correctly separate harm from other behaviors with an accuracy 80% (compare to player reports, which tend to have accuracy closer to 15-30%!); and that number can be increased to 95% or above as ToxMod continues to automatically improve.
That data will be sent to Modulate’s secure servers for deeper contextual analysis, after which it is shared through a secure web console or API for your moderators to examine and respond to. Our servers are secured with industry best practices in an isolated AWS environment, and can be linked through VPC Peering with your existing AWS environment to further minimize any security risks related to the sending or receiving of data. If absolutely necessary, we are also able to deploy ToxMod on-premise within your own environment, though this may result in increased costs and alterations to support responsiveness.
To start, we don’t collect any PII about your users; just the relevant voice chat information and an anonymized user id. ToxMod only stores data to aid your moderators in making their decision. Moderators will be able to see saved audio, transcriptions, and other conversational context (for harmful conversations only, of course!) for a set number of days before that data is deleted. And of course, all data is stored and transferred using industry-standard encryption. Modulate also conducts regular penetration and security tests and is certified with full ISO 27001 compliance.
Modulate never shares your player data with any third parties. Some studios authorize Modulate to use portions of their data (after additional anonymization is done) to improve ToxMod’s core services, but by default, each customer’s data is used only within their own ToxMod ecosystem.
The data is associated only with a User ID (or other identifier specified by the client) and all data is automatically deleted after thirty days. Of course, we also support data subject requests (as defined in e.g. GDPR), and can delete data for users immediately upon request - though we require that request to be validated by you first, since we don’t have any data on our side to tie any Modulate user ID to a specific real person. For more, please see the Privacy and Data section of our website.
ToxMod is designed to scale to many millions of simultaneous chats without issue. This is made possible by our revolutionary triaging technology, which uses multiple algorithmic “gates” to quickly identify non-toxic chats which don’t require moderation.
When you integrate with our SDK, we strongly recommend you send us individual audio streams for each user rather than a mixed stream.. If this is impossible, ToxMod can still function while processing the mixed audio, but its performance and accuracy will be decreased compared to the single-stream-per-speaker approach.
We actively train our models to incorporate commonly used slang and gaming terminology. Additionally we collaborate with our customers to ensure any vocabulary that is specific to their game is included in our models.
ToxMod has currently been trained only for the English language, but our model architecture supports any language straightforwardly. Multiple languages including Korean, German, Spanish, Mandarin, and French are currently planned on our roadmap in the near future, and we are happy to work with customers on any additional language support needs.
Platforms: Window 7+, MacOS, Android (including Oculus), iOS, PS4, PS5, Xbox One
ToxMod can be integrated with any game engine and VoIP solution fairly seamlessly; but we additionally offer example plugins to expedite integration for some of the most common setups.
The exact values vary a bit depending on your platform, but typically between 8-16 MB memory, and about ~0.5% CPU usage.
Yes. By default, ToxMod data is visible on your ToxMod Web Console, but this data is supplied through a straightforward HTTP API. If you already have a moderation platform you wish to continue using, we’re happy to work with you to connect our API to that system.
Absolutely! ToxMod automatically learns based on your moderator’s behavior what should and shouldn’t count as disruptive, but if you’d like to do more, we give you additional levers related to different types of offenses (such as racial offenses vs religious hostility.) You can set your tolerance for each category individually, so if your game has substantial violence built in, you might only moderate the most severe violent dialogue, compared to a game which was catering to a younger audience. These settings can be adjusted live from your ToxMod Web Console at any time.
ToxMod is designed with a core SDK which can integrate smoothly into any game regardless of your specific game engine, VoIP solution, or platform, as well as some convenience wrappers to speed up the integration further for certain common setups (such as Unreal Engine + Epic Voice Chat + PC.)
If you’re looking for an engineer on your team to integrate one of these convenience “plugins”, you’ll need someone who meets the following criteria:
- They are a game developer who knows how to use a game engine (Unity / Unreal), and they know how to install a plugin for that game engine
- They have integrated (or are integrating) voice chat into your game (presumably through a plugin)
- They do NOT have to know audio programming or the ins and outs of how voice chat works.
- They SHOULD know where in your code a player joins and leaves a voice chat room, and how you communicate that to your voice chat framework
- They are comfortable with the primary programming language for the game engine you are using
- They don’t need to know how to work with DLLs or shared libraries, but it can sometimes aid debugging if your game already utilizes similar dependencies to those our plugin will include (particularly libopus)
If you’ll instead be using the C++ Core SDK directly, you’ll want someone with the above expertise as well as a few other points of knowledge. The specific expertise you need will depend on whether you’ll be implementing our Server-Side SDK (Enterprise customers only) or our Client-Side SDK.
Client-Side SDK prerequisites
- They’re capable of managing state, creating/destroying resources “responsibly”, working w/ structs & pointers & C arrays
- If your game (or more precisely, voice chat framework interface with your game) is in another language, they are comfortable writing a wrapper for that language around our C interface
- They know audio programming - they know not to allocate memory in the audio thread, they know how to do format conversation (e.g. shorts to floats), etc.
- They know how your voice chat framework works - they know where the callback to get raw audio is, they know where the information on player and session identities lives, etc.
- They know how to deal with incorporating and maintaining shared libraries on your platform
Server-Side SDK prerequisites
- All Client-Side SDK prerequisites, plus…
- They know how to interact with raw opus packets from your voice chat stream on the server-side
- They are comfortable with multithreading (voice chat servers may receive many packets on distinct threads at once)