It’s a gamer’s worst nightmare. The shot is lined up, the monster’s treasure is calling, and then the action freezes. Kicked to the lobby or forced to reboot, the gamer loses a hard-played shot at points and booty.

The prospect of such negative experiences also keeps gaming developers and infrastructure planners up at night. Competition among massively multiplayer online role-playing games (MMORPGs)—from World of Warcraft to Fortnite—means unreliable server connectivity or latency-plagued action can easily spell doom. It destroys the immersive experience and directly affects gaming companies’ bottom line.

And today’s gaming market isn’t child’s play. It’s big business – over £98 billion, with more than 2.5 billion video gamers of all ages worldwide. Enthusiasts run the gamut from mobile to console to PC, with the latter still comprising 57% of global revenues. Even Google is joining the movement, with 2019 plans to launch a new digital gaming platform, Stadia, which will stream games that used to rely on discs or downloads.

Demands on gaming infrastructure can be intense. In 2018, Playerunknown Battlegrounds (PUBG)—touted as a Fortnite alternative—hit 550,000 players hourly on Steam, the online-only game store. Another MMORPG, Defense of the Ancients 2 (DOTA 2), averaged 532,000 players hourly. Peak player numbers frequently reach into the millions, and delivering realistic action at these volumes is a challenge which companies use various strategies to overcome.

Gaming infrastructure design

Most online games use a client-server model. The client running on the user’s computer, console, or smartphone includes the playing board and user’s viewpoint, based on algorithms that dictate the representations of the game world and action that the gamer sees and hears. On the server side, the multiplayer universe is constructed, incorporating the various connected clients into an accurate depiction for all players.

Like most other client-server applications, MMORPGs rely on multiple servers to spread out workloads and, hopefully, provide adequate redundancy to minimise freezes and other issues. The functions are generally divided among distinct server clusters, each dedicated to a particular purpose. These include providing login capabilities, supplying access to account information and character statistics and gear, and representing the game’s physical environment. It can also include performing the calculations related to players’ actions and the associated in-world physics, delivering chats, or handling players’ voice traffic. There are usually patch servers as well, because software updates to eliminate outdated code are frequent.

Most information about gaming companies’ infrastructure is closely guarded. However, a 2006 disclosure to the U.S. Securities and Exchange Commission, Vivendi, revealed that running World of Warcraft required about 9,000 servers worldwide. Of course, demand for more features and ever more realistic experiences has only increased hardware requirements in the intervening decades.

Sources of problems

Most gamers understand that their home system must make the grade or their experience will suffer. Games are among the most demanding software out there, and to play the new releases, it’s often essential to invest in a machine with a fast processor and top-end graphics card. Internet connection speeds are also important. Although games tend not to have high-bandwidth requirements, bursts of small packets must arrive rapidly to ensure a realistic experience.

Assuming the client-side hardware and broadband performance meets or exceeds a particular game’s requirements, there are other reasons behind why play can freeze or lag. For example, local internet or data centre outages are always a possibility.

Moreover, viral popularity is a real challenge. A largely unknown game with minimal web infrastructure can become wildly successful overnight. Gaming companies must determine where and how to invest up front, as failing in that leap to stardom can easily turn players off and kill the surge.

Fortnite, with its 125 million players and up to 3.2 million concurrent sessions, relies on hyperscale cloud services to deal with volume spikes and provide robust, global infrastructure. In fact, turning to Amazon Web Services (AWS) is becoming a standard tactic, in part because the talent required to run gaming-capable infrastructure is in short supply.

Even with such a partner, however, Fortnite has experienced multiple outages. Some have derived from distributed denial of service (DDoS) attacks launched at AWS systems, while others were caused by overall popularity taxing technical limits in certain regions. To be sure, hackers’ DDoS attacks have historically been a core problem for games, more so than viruses. Minor interruptions that wouldn’t fundamentally alter the Instagram experience, for example, are deadly in an MMORPG world, given the low-latency demands.

Another fundamental issue is code. Updates are frequent, as games offer a dynamic experience and need to eliminate redundant code. But like any other continuous deployment scenario, bugs are a risk, and strong testing and change management protocols are required to limit problems in production environments.

Keys to avoiding game outages and freezes

There are few industries placing as high a demand on infrastructure as MMORPGs, so it’s essential to follow best practices regarding data centre design and maintenance. Less intensive mobile games are more likely to be run on wholly owned equipment, but the high usage rates—reaching some 70% of the population—makes it impossible to skimp on infrastructure.

Gaming enterprises operating their own servers will want high redundancy and, for MMORPGS and other action-style games, geographically dispersed facilities to minimise lag. Data centre managers will generally need a solution to track an immense quantity of IT hardware inventory and assets, including the length of time each piece of equipment has been active. This can help identify when a given server or other system may be at risk of failure, as well as list the support coverage and provider information to enable rapid response in case of failure. Properly planned architecture will help to ensure that a single server outage does not result in system-wide downtime, but it’s nonetheless best to make repairs with utmost speed.

Software updates and other basics of data centre maintenance will also take on great importance in a gaming scenario. Security-related patches from the original equipment manufacturer, for instance, should be monitored and installed. Scheduling such updates, as well as hardware replacement and upgrade activity for off-peak gaming periods, will reduce or eliminate impacts on players.

Beyond high-level data centre management and maintenance, various gaming companies are exploring additional measures. For example:

  • Breaking game delivery into zones is now common and can reduce lag and provide redundancy. The prominent gaming companies understand peak player times, and zones will “follow the sun” around the globe as players in different regions log on in droves. The strategy helps gaming enterprises ensure speedy interactions via geographically local servers. Should any zone experience problems, such as internet outages, other nearby zones can typically fill in.
  • Some MMORPGs, among them PUBG, are matching players with similar ping rates, so action appears consistent because players’ internet speeds are. This can help provide better experiences in rural areas and less developed regions and ensure no one’s play is “brought down” by someone else’s slow connection.
  • Many gaming companies are tapping the cloud, whether as a primary provider of server capacity, or as fill-in infrastructure in zones where they would have difficulty operating their own equipment. They may also do so to help cover bursts of activity that challenge the company’s on-premises or colocated infrastructure and threaten the gaming experience.
  • Crowdsourcing beta releases is a viable option for field testing new code. The gaming industry enjoys a core audience of exceptionally enthusiastic customers. Much like the Windows 10 Insider group, it can exchange early access to features or other offerings for honest input and bug testing. Posting bug bounties, where active users are rewarded for identifying issues in the gaming experience, is another option for leveraging the gamer base in a crowdsourcing capacity.

Even for non-gamers, the challenges and solutions developed by gaming companies are important to monitor. We are rapidly approaching an age of augmented and artificial reality applications, which will take a gaming-style experience into consumers’ daily lives, for everything from shopping to news consumption. The technology industry has a great deal to learn from gaming companies’ success in achieving low-latency, resilient and immersive experiences capable of delivering hours on end of enjoyment, so we should all continue to pay attention and follow suit as their best practices develop.