Sunday, April 24, 2011

The Cloud Outage

O'Reilly Community: The AWS Outage: The Cloud's Shining Moment: if your systems failed in the Amazon cloud this week, it wasn't Amazon's fault. You either deemed an outage of this nature an acceptable risk or you failed to design for Amazon's cloud computing model....... two dueling architectural models of cloud computing applications: "design for failure" and traditional. ..... The Amazon model is the "design for failure" model. Under the "design for failure" model, combinations of your software and management tools take responsibility for application availability. The actual infrastructure availability is entirely irrelevant to your application availability. 100% uptime should be achievable even when your cloud provider has a massive, data-center-wide outage. ...... The advantage of the "design for failure" model is that the application developer has total control of their availability with only their data model and volume imposing geographical limitations. The downside of the "design for failure" model is that you must "design for failure" up front. ...... Physical redundancy encompasses all traditional "n+1" concepts: redundant hardware, data center redundancy, the ability to do vMotion or equivalents, and the ability to replicate an entire network topology in the face of massive infrastructural failure. ...... If you had redundancy across availability zones, you would have survived every outage suffered to date in the Amazon cloud. ...... If you had regional redundancy in place, you would have come through the recent outage without any problems except maybe an increased workload for your surviving virtual resources. ...... Cloud redundancy enables you to survive the complete loss of a cloud provider. ....... Being home to the world’s reserve currency confers great advantages on the U.S. economy. Because of it, our government, companies and households can borrow money more easily and cheaply. And because all that demand for dollars artificially raises its value, we can import goods at a cheaper price than other countries. ...... Applications built with "design for failure" in mind ..... will achieve uptimes you can't dream of with other architectures and survive extreme failures in the cloud infrastructure. ...... no humans, no 2am calls, and no outage! ..... Netflix, an AWS customer that kept on going because they had proper "design for failure" .. ? Try doing that in your private IT infrastructure with the complete loss of a data center.
I should have, but I did not expect this to happen. Servers are known to go down. Heck, PCs crash. The browser freezes. The cloud went down. In a big way. What's next? Datacenters? I think it did happen once. One Google datacenter went down. Correct me if I am not remembering it right. What if Facebook's datacenter in Oregon went down for an hour?

So the cloud went down. And there has been much talk. The Amazon Web Services is pretty much the cloud that most of us are privy to. And you thought Jeff Bezos was in the business of selling books.

The cloud should not go down. The cloud can not go down. It is like when there is a power cut the generator turns on on its own immediately, and so although there was a power cut, you did not feel it. The cloud needs that mechanism. Otherwise it is not a proper cloud. The cloud is not like the rest of us. The cloud is not supposed to go down.

New York Times: Amazon’s Trouble Raises Cloud Computing Doubts: the companies that were apparently hit hardest by the Amazon interruption were start-ups that, analysts said, are focused on moving fast in pursuit of growth, and less apt to pay for extensive backup and recovery services..... Amazon set up a side business five years ago offering computing resources to businesses from its network of sophisticated data centers. Today, the company is the early leader in the fast-growing business of cloud computing. ...... to avoid the costs and headaches of running their own data centers — simply tap in, over the Web, to computer processing and storage without owning the machines or operating software. ...... Amazon has thousands of corporate customers, from Pfizer and Netflix to legions of start-ups, whose businesses often live on Amazon Web Services. Those reporting service troubles included Foursquare, a location-based social networking site; Quora, a question-and-answer service; Reddit, a news-sharing site; and BigDoor, which makes game tools for Web publishers. ...... Amazon has data centers around the world, but the current problems have come from its big center in Northern Virginia, near Dulles airport. ....... Corporate cloud computing is expected to grow rapidly, by more than 25 percent a year, to $55.5 billion by 2014 ..... the computing equivalent of an airplane crash. It is a major episode with widespread damage. But airline travel, he noted, is still safer than traveling in a car — analogous to cloud computing being safer than data centers run by individual companies.

ReadWriteWeb: How Shoddy DNS Management Can Kill Your Small Business: Whether it's caused by a service provider going down, a DDoS attack or a sudden surge in traffic, having your Website go down can knock the wind out of your business. Most bigger organizations can recover from an outage, but for small and medium-sized businesses, even an hour of downtime can cost you in terms of revenue, not to mention customer patience and loyalty. For smaller operations, a major outage could spell the end of the business.

VentureBeat: Amazon’s outage in third day: debate over cloud computing’s future begins: The outage is costing web sites such as Reddit and Quora considerable losses as users turn elsewhere to get their social media needs met..... a big hiccup for an industry that is supposed to grow to $55 billion by 2014 ...... The duration of the outage has surprised many, since Amazon has a lot of backup computing infrastructure. ...... Corporations will have to decide what computer operations to put on a cloud operated by external vendors and how much they should keep inside their own internal data centers. They will also have to figure out the right policies for backup and recovery services. And they will have to decide whether to allocate more money to backup data centers in multiple locations. ....... Netflix uses Amazon but it hasn’t gone offline because it fully uses Amazon’s redundant cloud backup infrastructure. For most startups, those are luxuries that are too expensive ...... the web site developers should have planned for this kind of outage and taken advantage of Amazon’s full backup capabilities ..... Eventually, the cloud will become like a utility. You can get as much computing power as you want with the flip of a switch and you won’t have to worry about outages as much over time. But we’re clearly not there yet
Enhanced by Zemanta

No comments: