Thursday, May 1, 2008

Tomorrow's Forecast: Cloudy Skies & Sunshine

Our managed hosting solution reach a tipping point yesterday. We hit a environment configuration bump that was going to eat 24-36 of our precious startup development hours. This was second bump we'd hit with our hosted hardware provider, and it caused me to pick up the phone to evaluate a different vendor. With our provider in flux, we took off our shoes and socks and started to wade into Amazon's Web Services (AWS). Programmatic access to server instances running the environment/configuration of our choosing (for the most part) became too tempting. Google's App Engine is way too high-level in its current state to have even considered. Here's my first impression of AWS. The following assumes you have basic knowledge of EC2 and S3 as concepts.

Account Setup

For some reason Amazon thought they'd leverage their consumer shopping product UI for AWS. C'mon. I don't want to feel like I'm shopping for bathroom soap while setting up 509 certificates for API use. After getting over the UI, account setup was pretty straightforward. A Public Key handshake here, and Private Key store there, with a dash of PKI setup, and we had a full fledged AWS account ready for accessing EC2 and S3.

Clustering and what-not

AWS is dirt simple if you don't have a complex clustered network topology with lots of services running across multiple machines. If you're hosting a simple shopping website for example, you should be using AWS; no question. Amazon's ability to understand the relative importance of machines in your cluster doesn't exist; all instances (logical machines) are treated equal. That's great for AWS, but not necessarily for you. EC2 doesn't support multi-cast between machines, so if your model needs self-discovery, you'll need to come up with a homegrown solution, or find one cobbled together on the net.

Queues

If you're using a queueing framework, you may want to consider replacing it with Amazon's version (Simple Queue Service). Doing so alleviates some of the clustering/multi-cast issues I mentioned. If you need machine-to-machine level performance, there are potentially significant downsides however (note: we haven't finished load testing here, so the performance issues I outline are educated guesses; not based on empirical data):

  • There's an HTTP version. While this is nice and standard, message overhead is likely much larger than its brethren.
  • There's a SOAP version. While this is nice and standard...
  • Message movement in and out of AWS will cost you financially ($1 per 1M messages), though intra-EC2 communication is free.
  • Message movement in and out of AWS will cost you in latency (2-10 seconds to get a msg on the queue; YIKES!)

Persistent Storage

While S3 obviously provides persistent storage, the coupling with EC2 appears crude enough that instance-level storage requires some hacking. Amazon is wait-list-beta testing a more streamlined persistent storage model for EC2, and we're not in the beta yet, so I can't comment.

Ecosystem

There are already companies that provide nice automated instance provisioning services with the click of a UI element. If you don't use these (they can be pricey), you'll be building your own or writing scripts to setup/tear-down machines on-demand.

If I've missed something, or someone has better data, I'd love to hear about it. I have high hopes for cloud computing, and our initial experience is pretty good. I should disclose that we're building infrastructure software that has significant performance needs around message transmission, and a highly dynamic data set. We're not building websites.

No comments: