Cloud computing promises increased flexibility, faster time to market, and drastic reduction of costs by better utilizing assets and improving operational efficiency. The cloud further promises to create an environment that is fully redundant, readily available, and very secure. Who isn’t talking about and wanting the promises of the cloud?
Today, however, Amazon’s cloud suffered significant degradation in its Virginia data center following an almost flawless year+ long record. Yes, the rain started pouring out of Amazon’s cloud at about 1:40 a.m. PT when it began experiencing latency and error rates in the east coast U.S. region.
The first status message about the problem stated:
1:41 AM PT We are currently investigating latency and error rates with EBS volumes and connectivity issues reaching EC2 instances in the US-EAST-1 region.
Seven hours later, as Amazon continued to feverishly work on correcting the problem, its update said:
8:54 AM PDT We’d like to provide additional color on what were working on right now (please note that we always know more and understand issues better after we fully recover and dive deep into the post mortem). A networking event early this morning triggered a large amount of re-mirroring of EBS volumes in US-EAST-1. This re-mirroring created a shortage of capacity in one of the US-EAST-1 Availability Zones, which impacted new EBS volume creation as well as the pace with which we could re-mirror and recover affected EBS volumes. Additionally, one of our internal control planes for EBS has become inundated such that it’s difficult to create new EBS volumes and EBS backed instances. We are working as quickly as possible to add capacity to that one Availability Zone to speed up the re-mirroring, and working to restore the control plane issue. We’re starting to see progress on these efforts, but are not there yet. We will continue to provide updates when we have them.
No! Say it’s not so! A cloud outage? The reality is that cloud computing remains the greatest disruptive force we’ve seen in business world since the proliferation of the Internet. What cloud computing will do to legacy environments is similar to what GPS systems did to mapmakers. And when is the last time you picked up a map?
In the future, businesses won’t even consider hosting their own IT environments. It will be an automatic decision to go to the cloud.
So why is Amazon’s outage news?
Only because it affected the 800-pound gorilla. Amazon currently has about 50 percent of the cloud market, and its competitors can only dream of this market share. When fellow cloud provider Coghead failed in 2009, did anyone know? We certainly didn’t. But when Amazon hiccups, everybody knows it.
Yes, the outage did affect a number of businesses. But businesses experience outages, disruptions, and degradation of service every day, regardless of whether the IT environment is legacy or next generation, outsourced or insourced. In response, these businesses scramble, putting in place panicked recovery plans, and having their IT folk work around the clock to get it fixed…but rarely, do these service blips make the news. So with the spotlight squarely shining on it because of its position in the marketplace, Amazon is scrambling, panicking, and working to get the problem fixed. And it will. Probably long before its clients would or could in their own environments.
Yes, it rained today, but really, it was just a little sprinkle. We believe the future for the cloud is so bright, we all need to be wearing shades.