Redundancy, Performance and Geo-Optimization with S3 and CDNs

Disclaimer: This is a theory, I haven’t tried this out (anyone at EdgeCast want to confirm/give me a free account to try it out? 🙂 )

I was looking at ways to store large volumes of user-uploaded resources (images) in a web environment tonight and had a bit of an epiphany. I had defaulted to Amazon’s S3 web service as the storage platform, since I’ve worked with it before and love the idea of an “unlimited”, fully-redundant storage device. The main problem with S3 however, is that it’s got less than stellar performance as far as latency and geographically-optimized delivery. That, and the bandwidth is relatively expensive.

So — what happens if you combine S3 with a Content Delivery Network of some sort?

I have recently done some work with the guys over at EdgeCast, who provide a nice, proxying CDN that’s really easy to use. Basically you CNAME a sub-domain of yours to point to their network, and then all requests to that sub-domain are mapped via their CDN nodes to your static resources. So normally, that’d look something like this (if you had the CDN configured to load files from www.example.com):

User Request (cdn.example.com/images/test.gif) --> edgecastcdn.net --> www.example.com/images/gif

The CDN node closest to the User will handle the request, loading the resource from the source domain (www.example.com in this example) if required, then passing it along to the User. That got me to thinking — could you combine the performance of the CDN with the redundancy and reliability of S3? Sure, why not?

Initially I was going to say that you’d want to do the CNAME configuration of S3 to point a sub-domain on your main site to S3, but I’m going to take a stab in the dark and suggest that you can skip that step if you like. Instead, jump straight to EdgeCast and configure an “HTTP Customer Origin” to point to your bucket, let’s say it’s “cdn_s3”. You would enter “http://cdn_s3.s3.amazonaws.com/” in your admin screen. From my experience you’ll also need to contact EdgeCast support and ensure that they configure the “Customer Origin” in their system correctly. You’re going to want to have them point it to your S3 bucket’s URL, so that it’ll proxy content directly from there. You will also need to configure a DNS entry (CNAME) for “cdn.example.com” pointing to (in my case) “edgecastcdn.net” to handle requests from your web pages for images in your S3 bucket.

Let’s say you had an <img> tag in a page that referenced “http://cdn.example.com/images/test.gif”. Theoretically, you should be able to upload “images/test.gif” to your S3 bucket, then when that URL was used in a page, the request would be passed through to edgecastcdn.net (via the cdn.example.com CNAME), which would distribute it across it’s CDN to the node nearest to the requesting user. If that node had a “fresh” copy of the content, it would be served immediately. If it didn’t have a fresh copy, then it would request it back via the “Customer Origin” configured at EdgeCast, which is to say it would load it directly from http://cdn_s3.s3.amazonaws.com/.

Voilà – redundant, high-performance, geo-distributed and optimized content delivery. Right?