Online Gaming VoIP Primer

It's time to add voice chat to your game. The benefits to players and engagement are clear, but it can be complicated figuring out how to introduce voice chat in a way that will be simple, cost effective, and safe. At Modulate, we’ve worked with developers across a range of platforms, genres, and scales to augment their voice chat, so we’ve become quite familiar with the ins and outs of the different voice chat systems. And in doing so, we’ve repeatedly heard the same concerns - that VoIP is a huge engineering investment, that the concepts are complex and unfamiliar, and that the costs are unmanageable in the long run. So we wanted to do our part by putting together everything that we’ve learned about how to implement VoIP reliably, cost-effectively, and quickly.

We’ll break this guide down into three parts. In the first section, we’ll introduce you to the most common VoIP providers. We’ll then present a quick feature comparison between those offerings for your reference. And finally, we’ll go a bit deeper into how you actually plug these VoIP solutions into your game and walk through the most fundamental concepts in more detail.

Let’s dive in!

Voice Chat Providers

There’s a huge number of voice chat providers out there, but in this blog post, we’ll be focusing on the three most common low-level libraries - Epic Online Services Voice Chat, Vivox, and Agora Voice. These three offer good coverage for most online games, and even more specialized libraries (like High Fidelity for high-def spatial audio, or Photon Voice for a more abstracted interface) will make use of many similar concepts.

Epic Voice Chat

Epic is the most recent provider to start offering a VoIP solution - though the infrastructure has been thoroughly battle-tested within Fortnite itself before Epic opened it up for use by other developers in July of 2021. It’s also rapidly gained popularity among indie developers, in particular for the simple fact that Epic Online Services (EOS) Voice is completely free for any number of players.

Since EOS Voice was originally used locally within Fortnite, its core functionality is reliable and tested, but supporting game engine plugins are new and in active development still. Most notably, the Unity EOS plugin does not offer full Voice support or complete platform support yet, although progress on those fronts is rapid and both can be expected to be completed very soon.

(Modulate has also built our own game engine plugins, for both Unreal and Unity, which showcase the use of EOS Voice together with Modulate’s platform for our partners.) On the flip side, integrating with EOS can also open the door to many other Epic services such as multiplayer, achievements, matchmaking, and anti-cheat. The interoperability between their Lobbies service and their Voice service can be especially appealing since it can simplify the integration of Voice into some titles.

It's important to note that EOS Voice does come with some service limitations, most notably that voice chat rooms can be no larger than 16 people. It's also worth noting that EOS Voice does not have certain niche features offered by longer-standing competitors like Vivox, such as positional audio, though we expect Epic to continue extending the EOS Voice solution moving forward. Finally, because it's a new product, EOS Voice support for older game engine versions is not available as of yet. If you're looking to integrate EOS Voice into an existing game, make sure you're using Unreal 4.27+ or Unity 2020.1.x!

Vivox

Vivox (which was purchased by Unity in 2019) offers a traditional, standard approach for voice chat. It has been used in large games such as League of Legends and Valorant, and includes quite a few features. Vivox also supports text chat, text to speech (TTS) and speech to text (STT) in addition to voice chat. This diversity of features can be quite powerful, but does come with its tradeoffs. Some developers have expressed confusion when navigating Vivox’s different features or searching for appropriate documentation - especially if they are looking to implement a relatively straightforward voice chat ecosystem that bypasses some of the more advanced options.

Vivox provides game engine plugins for both Unity and Unreal, as well as example projects of how to use the plugins in one use case. (As with EOS Voice, Modulate has also built our own game engine plugins which demonstrate the use of Vivox on both platforms in combination with our own tools.) Finally, Vivox does charge for the use of their VoIP systems for any games with more than 5,000 peak concurrent users - so the choice between Vivox and EOS Voice often depends heavily on whether Vivox’s additional features provide enough value to offset that introduction of costs.

Agora

While EOS Voice and Vivox are multi-platform VoIP tools that started in the PC space, Agora is a mobile-native VoIP solution. Agora isn’t specifically targeted at the gaming space, but is used in a variety of social mobile apps like Clubhouse as well as by some game developers like Super Evil Megacorp in their mobile MOBA Vainglory.

Like Vivox, Agora does charge using a pay-per-minute structure for customers that exceed a free tier (in Agora’s case, this free tier is 10,000 minutes per month, a slightly different approach than Vivox’s peak concurrent users count.) As mentioned above, Agora is also optimized for mobile, using a peer-to-peer architecture to scale efficiently. On the flip side, their game engine support is a bit sparser due to their broader focus - they provide a Unity plugin and a raw C++ SDK, but do not currently offer Unreal Engine support.

Finally, in terms of features, Agora offers a strong suite of tools including high fidelity sound quality, spatial audio, and noise cancellation. Additionally, Agora supports video streaming (for a higher price). For most games this isn't a need, but if you want video calls between players this feature could separate Agora from the rest of your choices.

Provider Feature Comparison

Each VoIP provider offers a few unique features, but there are also a few standard components that typical game developers are looking for. Positional audio, for instance, helps immerse players more deeply into the game environment, and players are notoriously wary of poor voice quality or noisy channels interfering with their ability to chat. The below chart compares these most standard features in a simple form to help you identify the best provider for you.

VoIP Feature Chart

The Integration Process

Once you pick your provider, you’ll still need to actually integrate voice chat into your game. This process can be intimidating, especially due to some of the new or inconsistent vocabulary used by VoIP providers, but can actually be done in a matter of days or even hours once you gain comfort with the basic components of each system. Below, we walk through the basic steps that you’ll be required to follow for any provider, and attempt to clarify the different terminology or slightly different framings that each provider uses for these steps.

Authenticating With Your Provider

When initializing VoIP within your game, you’ll need to authenticate with the VoIP provider before you can begin transmitting audio over their network. This authentication is typically done on a per-player basis so is best handled as part of your game client whenever a player chooses to participate in voice.

Each provider has a slightly different authentication methodology, but in practice they all look fairly similar - you’re given an API key of some kind, you send that up to the provider's servers along with player information, and you get back a per-user token if everything worked.

For Vivox, this is pretty much exactly the process - you’ll pass their server a key from your game and receive back a unique token for each authenticating player. Epic Voice provides two different authentication methods. The first way is to use the EOS Lobbies SDK to manage rooms. In this case the Lobbies SDK manages per-member tokens which Voice can use directly, effectively bypassing the need for extra authentication - but of course, this depends on you having already utilized the Lobbies SDK. The second way is more similar to Vivox, in which you create a dedicated trusted server instead of the Lobbies SDK which can request per-user tokens from the EOS backend and distribute them to players. This second approach can be a bit more complex, so this is the approach we demonstrate in the Modulate demo projects. Finally, Agora also uses tokens as the preferred authentication method, and provides example code in a number of languages to run your own server to issue tokens.

Determine Who Is Speaking To Each Other

Every voice chat has some notion of having separate groups of people talking amongst themselves. Before integrating, make sure you’ve thought about who these groups are in your game. Depending on the specifics of your title, these may be individual teams working together; everyone in a given match; anyone within a certain server or spatial in-game area; or just whichever friends decided they wanted to chat.

Each provider uses different terminology for these groups - for Vivox, they are Channels, while Epic Voice calls them Rooms and Agora uses the term ‘Streams.’ But the interface for each is pretty consistent. You’ll define some unique ID for each group of people, and then have “join” and “leave” events for each player when they enter or exit the group chat. Once a player has joined the chat group, you’ll usually just execute a simple call from your provider’s API to begin uploading audio from the player’s microphone, which your provider will distribute to anyone else in the same group to hear.

Implement Safeguards

Voice chat is powerful, but can also be a medium for abuse. While it’s clear that the benefits of voice chat outweigh the costs, there are also a number of simple steps you can take to empower your players and reduce the risk of misbehavior causing harm to your community.

First off, all three major VoIP providers support the ability to not only mute a player within a whole group, but to specifically mute them for any individual member of the group. Make sure you expose UI elements for your players to mute individuals who are trolling, harassing, or otherwise harming others within the group. Some games tie this to additional reporting functionality, which can be a powerful step to better identify who the most aggressive bad actors are on your platform, but make sure that you’re not adding extra hoops for your players to jump through in order to protect themselves.

Extending that logic, while muting is great for putting control back in the hands of your players, you also don’t want to make it their job to clean up your community. Some bad behavior (like child grooming or radicalization) involves victims who don’t realize they are in trouble until it’s too late, and even if your players are fully aware that an aggressor deserves to be punished, they may not have the energy, confidence, or motivation to take action against them - and may instead simply churn away from playing your game moving forward. So make sure you consider proactive steps that you can take to prevent misbehavior in your community. This is Modulate’s specialty, and we’ve worked with a variety of game studios to implement multiple tools to ensure voice chat stays safe and immersive for your whole community. Our ToxMod service identifies misbehavior - as defined by your code of conduct - and flags it to your moderators even in the absence of additional player reports. Our VoiceWear service, on the flip side, levels the playing field in voice chat, giving players more control over how they express their identity online in addition to creating a wide range of possibilities for player immersion. Modulate’s demo projects demonstrate how to integrate both ToxMod and VoiceWear, and both services tie in smoothly to any VoIP system.

Test and Deploy

Once the pipes are all in place, be sure to take some time to test voice chat features. It’s well worth the time to test on each hardware platform, as audio parameters can vary (different microphones, sample rates, buffer sizes etc) in ways that are hard or impossible to catch from documentation or code alone. Make sure to verify basic functionality (is audio coming through, no stuttering/glitching) as well as making sure the experience is pleasant (speakers are all similar in volume, audio UI is consistent). Finally, just as you’d buckle your seatbelt before you begin driving, make sure to test that your safeguards are working as intended before you go live. If you’re working with Modulate, we’ll collaborate with you to test our integrations - in ToxMod’s case, we’ll ensure ToxMod catches some mock toxic audio you say in a test chat; while for VoiceWear, we’ll want to validate both the quality of the converted voice as well as ensuring that everyone else in the group hears the skinned audio and not the original.


Conclusion

Integrating voice chat can seem daunting at first, given the wide variety of providers each using different terminology and offering a diverse set of features. But the good news is that, as you’ve hopefully seen, VoIP integration in most games can be boiled down to a few simple steps which developers can complete in days or hours, not weeks or months. We hope this guide has been a useful tool for you to identify your ideal provider and get a sense of what that integration will actually require - and encourage you to reach out if you’re interested in learning more or checking out our demo projects to actually see what the code looks like in practice.

About the author: Zach Neveu is a Core Software Engineer at Modulate. His primary work includes leading Modulate's efforts to build a wide array of game engine plugins and demo projects which demonstrate the use of Modulate's software together with major VoIP systems. He has also worked closely with multiple of Modulate’s customers through their VoIP integration process, learning valuable insights about the challenges which he hopes this article will help bypass for others!