Player Risk Categories: Going Beyond Toxic Language Detection

Since its launch, ToxMod has been detecting specific instances of toxic language (what we call “utterances”) in real time. Being able to distinguish an “f you!” from an “f yeah!” has been incredibly helpful for moderation teams to quickly and accurately distinguish between bullying and enthusiastic gameplay, but we wanted to provide even deeper insights into especially harmful behaviors going on in voice chats. In other words, we wanted to be able to better understand when a player says something inflammatory in voice chat just to ruffle feathers, or if the person is likely engaging in a concerted effort to, for example, radicalize others or groom underage players.

We recently updated ToxMod to detect specific Player Risk Categories: violent radicalization and child grooming. We chatted with J.D., our Machine Learning Manager, to learn more about this new detection feature in ToxMod, why it was built, and why it’s important to understand players’ behavior over time in addition to being able to detect specific toxic utterances in real time.

So what exactly are Player Risk Categories?

Player Risk Categories are a new class of detection algorithms that detect repeated patterns of behavior over time – specifically patterns of behavior that indicate a significant risk of violent radicalization or child grooming. ToxMod can provide a risk score based on each of these Player Risk Categories to give moderator teams insight into which players they may need to take action against for violating a game’s Code of Conduct or even acting unlawfully. 

What’s the difference between this and the Utterance Risk Categories that ToxMod already had?

Whereas Utterance Risk Categories will flag specific instances of toxic speech, the Player Risk Categories give a more holistic view of individual players and their behavior over time. Since there is no single keyword that would indicate these types of problematic behaviors, this new framework allows ToxMod to take into account repeated player actions over a longer period of time instead of one particular instance. 

You mention looking at behavior over time in order to gauge risk of a player’s behavior. Why take that more longitudinal approach? 

When looking to provide a risk assessment on behaviors like violent radicalization and child grooming, there’s really no one “smoking gun” so to speak. So we’ve developed the Player Risk Categories to factor in timescales to give a better indicator of intent. 

One example might be a player who uses extremist language: they may be using this language only to get a rise out of other players in voice chat. So while individual instances of extremist language would be flagged by ToxMod’s Utterance Risk Categories, by only having one such instance it’s difficult to infer intent. Looking at a player’s pattern of behavior over time, we’re able to better understand their potential intent. If that player continues to use extremist language over the course of many weeks, or is even increasing their use of extremist language, the violent radicalization Player Risk Category will flag this pattern for moderator teams to assess. 

Why add this feature into ToxMod?

We’d been researching and developing this new feature knowing how severe and urgent the issues of violent radicalization and child grooming are in video game spaces. We prioritized these first two Player Risk Categories in particular because those are the most nefarious and harmful behaviors that continue to spread online, and in particular in video games

How do you see Player Risk Categories developing in the coming months?

This detection framework of understanding behaviors over time could definitely be applied to other Player Risk Categories like bullying or hate speech to show moderators the players who are highest risk for continued bullying in game. In the future, I hope to continue developing the Player Risk Categories as well as the Utterance Risk Categories to give an even wider and more precise view into toxicity in voice chats.