Over on the 37signals blog, DHH writes Mr. Moore gets to punt on sharding. His argument is basically that if you continually delay fixing your data storage and retrieval layer, Moore's Law will be there to save our ass--over and over again.
Bzzzt. Wrong answer.
Depending on future improvements to fix your own bad planning is a risky way to build an on-line service--especially one you expect to grow and charge money for.
It's easy to forget history in this industry (as Paul pointed out in the comments on that post). There was a point a few years ago when people still believed the clock speed of CPUs would be doubling roughly every 18 months for half the cost. Putting aside that Moore's Law is really about transistor density and not raw speed, we all ended up taking a funny little detour anyway.
Until recently, the sweet spot (in terms of cost and power use) was probably a dual CPU, dual core server with 16 or 32GB of RAM. But soon that'll be dual quads with 32 or 64GB of RAM. And then it'll be quad eight core CPUs with 128GB or whatever.
But notice that nowadays we're not all running 6.4GHz CPUs in our servers. Instead we're running multi-core CPUs at slower clock speeds. Those two are definitely not equivalent.
A funny thing happens as you add cores and CPUs. You begin to find that the underlying software doesn't always... get this... scale. That's right. Software designed in a primarily single or dual CPU world starts to show its age and performance limitation in a world where you have 8, 16, or 32 cores per server (and more if you're running one of those crazy Sun boxes).
You see, David is talking specifically about MySQL (and probably InnoDB), which is currently being patched by outside developers precisely because it has multi-core issues . Its locking is expensive and not granular enough to utilize all those cores. It's expensive in terms of memory use too. And there are assumptions built into the I/O subsystem that don't scale well in today's world of fast multi-disk RAID units, SSDs, and SANs. People are hitting these issues in the real world and it's definitely becoming a serious bottleneck.
See Also: The New MySQL Landscape.
Moore's Law is no silver bullet here. A fundamental change has occurred in the hardware platform and now we're all playing catch-up in one way or another.
I'll discuss this a bit in my upcoming MySQL Conference Talk too. The world is not nearly as clear or simple as DHH is suggesting. Perhaps they can get by with constantly postponing the work of sharding their database, but that doesn't mean you should follow their lead.
Posted by jzawodn at January 07, 2009 07:10 AM
I'm not sure I understand the argument here. Are you suggesting that all services should be setup for sharding from day 1? Is that the most important thing that they could be spending their time on? It sounds like it from what you write above since not sharding is apparently something that needs to be "fixed" and constitutes "bad planning".
Progress in technology moves the barriers of when you need to deal with hard problems. There's nothing wrong with that. Everyone used to write C for everything because not doing so was too slow for the machines available. Today we get to write in incredibly inefficient languages like Ruby because the barriers have moved very far out. There are tons of problems that today can be solved faster by trading programmer productivity for computational efficiency.
As long as those barriers continue to outrun my ability to run into them, I'll happily punt on making things faster or more "scalable". Then, should the day arrive, when the barriers no longer move as fast as you need them to punt, you just deal with it. Instead of trying to guess when that day might be.
Also, I'm sharing my particular experiences of how this has worked out great for us. That doesn't mean that it'll work out great for everyone. I'm sure Yahoo and others in that class need to shard. They also need to do a lot of other hard work that I thankfully don't have to deal with.
And there are plenty of web applications out there that are MUCH smaller than Basecamp. They should probably worry about getting to the size of Basecamp before worrying about getting to the size of Yahoo. The former is much more likely as well.
I think this is one of those cases where the 37Signal's guys have made an oversimplification. I enjoy reading their posts, Getting Real, etc., because most of what they have to say makes sense, but sometimes it just doesn't stand up to practicality. "Don't worry about scaling until you have to" is a recipe for fail. The Moore's Law db is another example. I think it's unwise to say "don't shard" as a blanket statement. There are several use cases when it's the right decision from jump.
DDH,
Jeremy isn't arguing that everyone needs to shard or "plan for the future now". His argument is that you shouldn't count on Moore's law.
Your article specifically said "So as long as Moore’s law can give us capacity jumps like that, we can keep the entire working set in memory and all will be good."
Jeremy's response is "don't bet on it."
I think both are good opinions worth thinking about. Certainly we should expect increases in technology, but if you're expecting your business to grow beyond your current abilities then perhaps having a growth plan that doesn't involve Moore's law would be a good idea.
@DHH
I think your points are valid to an extent. Not all projects need to shard. But I have to ask, why not just make ActiveRecord shard without making the user do any work? The beauty of evolving software development and certainly the Ruby community has been improving efficiency and productivity. It would save everyone headaches if RoR just had it baked in, so one day I wouldn't have to worry as much about 'deal with it now'.
Actually, he's right - but only if your service grows "slowly". :) As long as your service grows more slowly than hardware and software advances, you're fine.
It's a risky bet, and it works for some people - particularly the paid services, like 37signals. (I wish it worked for SmugMug - it feels like it *almost* does, but nonetheless, we're always scrambling...)
And it's certainly not a bad thing to manage your growth to stay at a pace you can keep up with. But obviously a free, explosive growth service like Facebook doesn't have the same luxury - and thus, can't rely on Moore's Law. 37signals should count their blessings that they have a profitable, growing business that can ignore sharding for now. That sounds like heaven. :)
I'd be happy to see Active Record shard "by default", if that's possible. It seems that there are a lot of variables in how people can and do shard, though. Which makes abstraction harder. But certainly something that could be done.
I tend to work on my own problems, though. Sharding isn't a problem of mine yet or perhaps ever. So best get someone involved who actually needs to shard and has the experience and incentive to work on a generic solution for that problem.
Since nobody has solved this yet for Active Record, it's probably another data point that very few applications actually need to scale.
Apart from all the other issues (including finances), the growth speed of a web site or business may not neatly fall within the bounds of the development of new hardware. It's betting on one's own business failing - rather weird.
@DHH:
I think you make a very valid point by saying "very few applications actually need to scale." Those of us working on applications that *do* need to scale often forget that the vast majority don't.
I don't think the point of the 37signals article was to say Moore's Law will always save you, but that you should focus on user enhancements, upgrading products, adding new features, creating new products, and etc. Don't spend all of your time optimizing the code, because for the most part, you can throw more hardware at the problem. I do agree that it is a bad idea to rely on hardware improvements to save your ass though.
I think there's some confusion here. Apps that don't need to scale don't need to scale. Period. Those folks really don't need to be in this discussion, aside from any passing interest in "how other people make it work."
Apps that *do* need to scale shouldn't rely on Moore's Law to save their ass. Why? Because the moment you fall behind the curve, the battle becomes dramatically hard to win. In fact, simply "staying afloat" becomes quite challenging.
I've seen this first hand more than once or twice. It's not pretty. Not at all.
I think this is a mis-statement of DHH's argument. His article simply says, "don't shard until you're big enough to need it." I have a hard time understanding why that's the least bit controversial.
I don't think its about relying on hardware / software infrastructure improvements, its more about being opportunistic should they occur.
If you do the work of sharding on day 1, you have to spend the work to get sharding into you app, whether you really need it or not.
If you can defer the decision as long as possible, then there is a chance that you will not need to do the work at all. If you do need to do the work, you don't need to pay the development and operations cost the entire life of the project. There is an added integration cost, but a well factored code base will minimize the integration cost by making it easy to add the needed abstraction.
It a simple economic rule that it pays to defer your costs. There may also be a price to defer your costs, but there are also ways to minimize this cost.
As DHH notes, most apps don't need to shard. But as others noted, when your app is a pay service, you get plodding, measured growth almost built in.
So sure, most apps won't ever get to Basecamp's size, let alone Facebook's. But what happens in the off chance you do get lucky? It's no secret that sharding can be abstracted away -- as evidenced by SQLAlchemy's excellent work, so I'd argue why choose a tool that you know will tie your hands if and when you ever get to exactly where it is you expressly intend to end up?
Of course by that logic, even with sharding, the RDBMS notion will never be easy to scale out. So why not just start with something that can? Something like CouchDB? (And as a bonus, by design Erlang can take full advantage of as many cores as you can throw at it, so you can scale up better too.)
My hat is off to you for a nice bit of Reddit-bait. From your sloppy misstatement of DDH's post ("save your ass" and "silver bullet", really?) and the deliberately confrontational and caustic "Bzzzt. Wrong answer" to the shameless self-promotion (linking to another blog post AND an invitation to hear you speak), this is a fine example of lazy and thoughtless blogging.
And you know what? I usually don't mind a good smack-around post if it is funny, insightful, and presents a well-reasoned argument.
Ah!
So *that* is why this is so popular on reddit.
Now it makes sense. Thanks.
(Too bad I was writing like this before reddit existed. When did reddit launch again?)
I don't have a ton of time right now to work on a big post.
Long story short.. if DHH would have sharded his DB now he could have bought 2-64GB boxes or 4-32GB boxes (though with one replica) and saved from 5-20k.
The 4-8GB DIMMs needed to build a 128GB box are NOT cheap. The sweet spot is 1-2GB.
Kevin
Apologies if you judge this off-topic, but I really appreciate your insights and readable presentation of scalability-relevant issues.
Therefore: Is there any news whether OReilly might publish a 2nd edition of your “High Performance MySQL”? Or have you got too many other fish to fry right now.
[ Would really like to see an update to cover MySQL 5.x. ...and maybe some of the wish-list things mentioned by reviewers at Amazon.com ]
Looking forward to it!
You need to understand how to partition your data.
Sharding is an application of partitioning. Processing, backup, upgrading, migration, etc all benefit from this as well.
You run into problems with scaling hardware LONG before you get too big. Given that the cost of scaling UP goes up faster than than the cost of scaling OUT, it makes no sense to NOT to understand.
If this means architecting your code so that you can be able to shard, then I recommend you consider doing so.
Additionally, I recommend having an architecture such that some users can migrate to cells or shards running different versions of your codebase (assuming you have an API in front of your database shards for caching, aggregation, or other logic separation) you can easily migrate users from place to place. This will let you do live testing, running A/B tests, etc.
There is too much to be gained and too much to be lost to be too lazy here.
And I should know - not sharding delicious essentially killed it.
I have never heard of Moore's Law (and in fact had to follow the link to Wikipedia, but it makes sense that although this has helped, we can't depend on it.
On the one hand, I totally agree with Jeremy's statement:
"Depending on future improvements to fix your own bad planning is a risky way to build an on-line service--especially one you expect to grow and charge money for."
But on the other hand, I take issue with the interpretation that other customers seem to be cooking up that sharding=good and not sharding=dumb.
You need to analyze your data and your growth and make decisions. Those decisions may include using a different database -- it may be cheaper to implement a different RDBMS rather than invest hundreds of man-hours into partitioning your data.
just curious how other databases (such as SQL Server) handle this problem - is there a current "best practice" or pattern that developers should look to if they think they are going to need to reach this sort of scale?