How ToxMod Learned to Listen

December 15, 2021

At Modulate, we often talk about how ToxMod is the first of its kind - an AI system that actually listens to the nuance of voice chats, rather than just reacting to simple keywords or phrases. But we also understand that it’s a hard claim to swallow. Plenty of AI tools brag about their sophisticated understanding - and few of them ever actually live up to the hype.

In this blog post, we’d like to “pop the hood” of ToxMod a bit, to help demonstrate where our confidence is coming from - and to give you an understanding of what we mean when we say that ToxMod can tell the difference between friendly trash talk and genuine harm.

Before we begin, it’s important to understand that ToxMod doesn’t speak the language of good or bad. It doesn’t make those kinds of binary judgements. Rather, ToxMod speaks the language of risk. If ToxMod assigns a conversation a low risk score, it means that it sees virtually no reason to suspect that anything harmful is occurring. When it assigns a high risk score, that means it’s seen multiple clues that something problematic is underway. What this means is that there is no word or phrase you can say which is guaranteed to be flagged by ToxMod. Certain words like curse words have higher risk factors than others, but ToxMod will always incorporate the other context from the conversation before it decides whether to surface any event to a studio’s moderators.

So rather than asking “which specific things get banned by ToxMod”, it may be more effective to ask “what kinds of things is ToxMod examining?” These things are what we call “signals”, and there are a variety of types.

The first signals ToxMod considers have to do with the individuals in the conversation. Are there kids involved, for instance? Do some of the participants have a history of getting banned? Remember, ToxMod won’t ever flag you just because you’ve misbehaved in the past - but another offense after you’ve already had a second chance may be deemed more problematic than a first-time offender.

Next, ToxMod examines what it knows about the relationships between the participants. Is this a public chat between strangers, or a private chat between friends who have a long history? If one of the speakers was previously banned, was it in response to a report submitted by someone else in the same chat now? Are many of the participants friends, but one or two players are new and therefore liable to be singled out? By understanding the nature of the group, ToxMod can adjust the risk factor, so that groups of friends get more leeway in what they call each other, but you might be held to a higher standard among strangers who are easier to offend.

Having incorporated this context, we now get into the conversation itself, which is rife with additional signals. This includes the obvious - ToxMod is capable of transcribing the audio and analyzing the text, both for keywords/keyphrases, and through more sophisticated subject and sentiment analysis. But ToxMod also utilizes a host of other models, considering whether the speaker sounds angry - or happy, sad, confused, worried, malicious, icy, determined, and more; whether they are laughing, crying, whispering, shouting, singing, or speaking normally; when and how frequently they are interrupting others; whether they are playing music or disruptive sound effects; and even whether they have become unusually quiet or passive in a way that implies they feel hurt or unsafe. All of these factors influence ToxMod’s risk assessment of any individual statement from one of the participants.

But even then, ToxMod is far from done, because determining harm isn’t just about what you said - it’s about how others felt about what you said. So ToxMod will also consider how each of the other participants respond to your comment. Are they all laughing along having a good time, or has your comment incited rage or fear? If one of them responds with something toxic, should they be flagged as the instigator, or did they just get pulled into responding aggressively to a comment of yours that was already far beyond the pale? Heck, are they muting you or sharply changing topic, suggesting that your earlier comment was indeed damaging?

We’re nearly there, but have one more step to consider. The signals we’ve mentioned so far do a good job detecting most direct, immediate harm, like active harassment or bullying. But what about more insidious harms? For instance, what about a closed group of friends...plotting a violent hate crime against those with a different skin color? Or a conversation without any emotional outbursts or offensive language...but which is a one-on-one between a preteen girl and a much older man, and which contains just enough discussion of the child’s family life to be concerning? Or what about someone simply describing their attempts at self-harm, hinting at a need for help but without anyone responding to support them? The direct harm from these conversations may only manifest clearly much later - and by then, the damage has all too often already been done. So ToxMod attempts to mitigate this by building additional signals around behavioral patterns common among these types of offenders or victims. This includes recognizing that the would-be predator has been speaking in repeated one-on-ones with younger participants, and constantly asking them to switch platforms; noticing that the player considering self-harm is escalating the intensity of their comments; or recognizing when the difference between a (still bad) one-off racist comment, and a group that is amplifying each others’ sense of rage or entitlement. These situations are also often much more complex than more direct harms, and require subtler intervention, so Modulate is working closely with our studio partners to explore softer ways to engage in these situations - such as surfacing support materials to someone considering self-harm, or pointing potential grooming victims towards safer communities of friends even if it’s not yet clear whether the adult in the conversation could actually be a predator.

Combining all of these factors together, ToxMod truly does have the ability to comprehend nuance and complexity in conversations in a way that’s never been seen before - not in text, where the lack of emotion makes it difficult even for humans to understand what’s really intended, and certainly not in voice, where we’ve had to design new ways to process this data cost-effectively and reliably. But it’s also important to acknowledge that, even with all of these tools at its disposal, ToxMod is still imperfect. Language is ever-evolving, especially in realms like gaming - even if ToxMod could catch every single offense today, tomorrow folks will find a new way to insult each other. So the last piece of the puzzle for ToxMod is how we put all those signals together into behavioral models which can actually learn and evolve to keep up with the changes in vocabulary.

The beauty of this is that the solution is, in some ways, built into the problem. ToxMod flags potential offenses to the attention of a studio’s moderators, who make the decision about whether the situation truly requires intervention. What this means is that the moderators are constantly providing ToxMod with a steady stream of real feedback about what counts as problematic today, even if that’s new since yesterday. What’s more, as players report offenses that ToxMod may have missed, our studio partners can also share that context with us to keep ToxMod aware of new offensive patterns. We also augment this with a team of real people at Modulate who constantly provide additional feedback to our AI system and re-check how its models are doing.

When a studio deploys ToxMod, they aren’t just grabbing a hammer out of a toolbox. ToxMod, and our team at Modulate that is constantly working to improve it, is more akin to a general contractor. It understands the whole landscape, adjusts its process to your unique situation - and over time combines all of its tools just right, to build a safer space for you and your community that will stay that way for a long, long time.