Wednesday, March 17, 2010

Amazon S3 file deletion FAIL

I'm a huge fan of Ec2 and S3 offerings from Amazon; they're the future. However, there's a severe deficiency in the S3 API; when you need to delete billions of files, the API breaks down. While there are batch operations, they don't handle enough keys to be of use when dealing with very large file counts, and enough requests are made over the long duration (days) of running your script, that expected errors wind up crapping out the whole process. Rinse-repeat doesn't work when the mean-time-between-failure is on the order of days.

Yes, I'm running the operations on an Ec2 instance. Yes, I'm using the almighty s3sync.

Amazon, you're printing money with AWS, please open up the deletion API so ppl can better manage their files. The current API's inability to a) delete non-empty buckets, and b) handle larger batch requests, feels a lot like greedy "lock-in."

No comments: