Look at all the user reviews of Diablo 3 - most of them are negative. Why? Is it because the game sucks?
Not at all! It’s because people can’t log in to their battle.net accounts to authenticate that they are valid users. Now, this post isn’t about the crappiness of DRM in gaming. The point that I’m trying to show is how critical solid operations are to the user experience of your product. In the world of the internet, all consumer facing applications need to be able to handle web-scale user loads. The fact that no one can log in to play a single player game on LAUNCH DAY is absolutely ridiculous - there’s nothing that special happening on the server side of Diablo 3 that is different from how Google and Facebook run their servers. Look at companies like Etsy or Facebook - they have zero downtime and deploy new versions of they applications multiple times per day. How many times per week do MMOs have to go down so the ops teams can deploy new versions of their code?
The reason I’m so fascinated about this problem goes back to some amazing conversations I had with Seth Thomas and Jerry Chen at Devopsdays Austin back in April. Both of them are/were release engineers for the gaming industry and when I compare their stories to the stories of people in web ops, it’s astounding. What I learned is that the ops groups at gaming companies are inexperienced, lack investment, and very inclusive to how their operations work. On top of this, I am going to guess that the gaming industry doesn’t get much talent in the space of release and operations engineering - most of its workforce enters with experience in 3D technologies and/or networking. It is amazing (in a bad way) that situations like the Diablo 3 launch occur even though these web scale issues were solved years ago by companies like Google, Amazon, Flickr, and others.