CTO Steve Parker Hard at Work During the Amazon S3 Outage
An Open Letter From LeagueApps Co- Founder and Chief Technology Officer Steve Parker
LeagueApps experienced a platform service disruption this past Tuesday, February 28th, 2017. This occurred when one of the basic cloud-based infrastructure services we rely on, Amazon’s S3 (Simple Storage Service), went down unexpectedly. You can read the official summary of the S3 disruption here. In short, a typo during a maintenance task caused the issue.
We first realized there was an issue with LeagueApps sites loading at around 12:45 pm EST. We immediately investigated and determined that the root cause was that S3 was down. Our platform relies on S3 to host and serve static assets, media files such as videos and images, and other content. These assets are important pieces to properly rendering our app pages.
Other sites like Netflix, Spotify, and Buzzfeed were also affected. It didn’t quite break the Internet, but a 4-hour outage at Amazon’s AWS cloud computing division caused headaches for hundreds of thousands of websites across the United States.
We know that our partners expect 110% from all of us at LeagueApps and we are committed to that expectation. We quickly implemented a short-term change to the platform which enabled us to bypass S3 and use a comparable cloud service at Google instead. The outcome was that we were affected for only an hour and a half, while S3 itself was down for over 4 hours.
We have been using S3 as part of the LeagueApps tech stack since the start and this is the first significant issue. In fact, S3 has been so reliable for so long, that it appears some panic ensued and caused ripple effects on Twitter.
We take platform up-time and stability very seriously. So I wanted to share a bit of insight into how we think about and support this. And hopefully, answer some questions you may have.
The LeagueApps Platform is supported by a coordinated set of independent Internet services, like S3. For instance, Google Cloud hosts our database servers and is the main driver of our development and IT operations. Stripe, PayPal, and Authorize.net handle payments, SendGrid is used for email and Twilio for text message processing
Utilizing these various services allows our platform to provide the best of breed capabilities to our partners that each service offers. But this also brings with it some complexity and risk we must manage.
We continuously aim to increase the degree of redundancy within our platform. Redundancy, in simple terms, means having a duplicate instance of a thing in case the primary one goes down. Over time, we’ve executed various technology projects that have increased the redundancy within our platform. For example, our app layer, as well as our database layer (the very core parts of our platform running at Google) are each a “cluster” of nodes (servers), such that a failure of any one single node does not affect the overall cluster’s up-time.
Additionally, we maintain redundant backups of key pieces of our infrastructure to be used in cases of emergency, and perform continuous automated data backups to redundant targets. These projects have paid off, and are a large part of why we’ve been able to maintain 99.9% up-time on a consistent basis.
Building in greater redundancy is a complex and resource-intensive process. And ultimately, even the biggest most sophisticated platforms can suffer interruptions like Amazon did this week. We are continuously evaluating new ways to increase our platform stability, and applying what we learn from events like this.
To that end, we will continue to plan out and prioritize the most appropriate projects to maintain our stability and ultimately ensure that our platform continues to support and empower your organization.
I’d like to thank you and all of our partners for your patience during the S3 outage on Tuesday. Even still, the feedback you all have provided to LeagueApps has been instrumental in helping us create the absolute best product to help you run your sports organization and business.
I look forward to many more years of continued partnership with you.
Feel free to reach out with any questions.