The Magical Fail Whale
June 24th, 2008

So I'm going to start off with a couple of caveats. I love Twitter, I met the guys responsible and they're wonderful and smart as feck. A lot of people have came out and publicly bashed Twitter a lot for it's scaling woes, applying blame and all that horrible stuff. All I can say is that I'm betting the agile nature of how they developed, grew and launched didn't uncover the size of the problem until it was too late. It's what I like to call a "Cluster Fuck". Trying to keep Twitter running, day to day operations, customer service and all of that stuff must take away from being able to work on re-architecting.
This is personally a big deal for me being so fond of Ruby, having to deal with the whole "Rails Doesn't Scale" FUD that is generated by Twitter's current reliability is a daily occurrence. It's exactly why I gave a presentation at Barcamp Belfast to try and address this. No matter how many hundreds? thousands? of production Rails applications that are out there working away like power-horses, the one kinda unreliable example (Twitter) is reused as troll bait all the time.
Mythbusters Style
- Rails & Ruby do scale.
- Anything scales, it just costs time, money and smarts.
- Ruby is such a "glue" that scaling is easy, easy to replace parts.
- But it will always be really difficult trying to fundamentally change a working machine.
- The more previous scaling work that has been applied will impact the effort needed to re-architect.
- Rails & Ruby are easily scaleable with some open source loveliness for 98% of cases, Twitter is not one of those cases.
- There are endless techniques to solving performance problems!
What Twitter Should be (and Probably are) Doing?
The problem with Twitter is that there is too much stuff going on in the database and Ruby for something as complex as message routing. Not taking any big leaps to judgement but as far as I can guess, when a user posts an update they have a trigger that replicates the message for the followers of that user. That works for a while until you get people like Scoble joining, imagine if Twitter was REALLY popular around the world, eek!
I'm going to bet that they approached building Twitter from a typical social network point of view, it's what I would have done. If however it's looked at from the perspective of existing technology like message routing and e-mail then it would lead to a different architecture.
Almost all of twitter can stay in it's current guise, but instead of just having ONE database (well there's two slaves) and memcached for caching. Think of it like E-mail, each user has their own database, like an inbox.
In the recent Q&A with techcrunch they said they are working on an "elegant filesystem-based approach" which to me sounds very similar to what I'm thinking of when I say inbox. So with this implementation in place, the load on Twitter's main database would be drastically reduced, with all those tweets in inboxes for each user.
There are also a load of developments going on in the Ruby community that could be applied (soon) to the twitter architecture to make scale even better.
- Faster, Better, Stronger... Virtual Machines from the Rubinius and Maglev teams
- Asynchronous processing with a little bit of Erlang love from Ezra (one smart cookie @ Engineyard) called Vertebrae
- Engineyard are putting their money where their mouth is and funding a lot of projects these kind of projects, karma.
Anyway, I wish the Twitter crew the best of luck. I can't wait till it's finished. Oh yeah and spread the word, Ruby does scale.


June 24th, 2008 at 05:54 PM
I wonder if the problems Twitter has (and some 37Signals apps) are more related to weaknesses in current implementations of Agile Methodologies than anything else?
JF said that Campfire is AJAX based and not Comet because they just wanted "to get things done", or be Agile. And a similar situation seems to be at the root of Twitter's problems.
Or maybe I'm on crack?
June 24th, 2008 at 05:58 PM
Twitter folk were admitted that they architected Twitter like a content management system when it should have been a messaging system.
As I have to work with Rendezvous every day and see millions of messages passing, I have to wonder....
June 24th, 2008 at 06:11 PM
Stephen,—
The Twitter problem could be described as being as a result of an agile approach, but I can't really agree with is being as a weakness.
From what I've heard, the original app was built and launched in 6 weeks. Now that's pretty agile.
The scaling issues are twofold: Scaling anything requires two things: 1) The NEED to scale 2) a scaleable architecture 3) $$$ to pay for that architecture
Twitter was a fast, cheap job that took off, and took off big. The risk was minimal. A thousand Twitters could have launched on the same day and if they flopped, not a lot would have been lost.
Twitter did take off, and very soon, as Dave rightly says, they found themselves with a platform that wasn't designed to do what Twitter users ultimately started doing: sending messages, having conversations.
I'm willing to bet, that had they invested in scaleable architecture first, and built their system from the ground up as a distributed messaging platform, then none of us would ever have got the service, they probably still wouldn't be launched and they wouldn't stand a chance.
June 24th, 2008 at 06:20 PM
Stephen, I don't think you're on crack. Agile Methodologies are great but when you don't apply good Domain Modeling to the problem at the same time you end up with a solution that works... until you realise there's a problem, in a lot of cases it may already be too late. In that vein I like to think for a good while of the simplest AND most sensible solution.
I don't think there are any other rails apps with scaling problems as big as twitter, all of 37signals products are "naturally sharded" by account/user structure. Campfire did require a part to be swapped out of Rails, the backend that responds to the Javascript polling, the (simply) rewrote it in C.