About a week ago, Nat posted Open Source NG Databases on O'Reilly Radar. That caught my interest because I'm playing with some "alternative" databases for some of our data at Craigslist. Don't get me wrong, MySQL is great. But MySQL isn't well suited to every use case out there either. (I'll talk more about this at the MySQL Conference.)
Meanwhile, I left a comment on that posting about CouchDB and have been playing with it a bit more since then--mostly loading in test data, figuring out the data footprint, performance, etc.
Overall, I'm impressed and encouraged. I agree with what Ben Bangert said. The simple API is great but the lack of a schema to worry about really makes my life simple in this application. I don't have any initial plans for views, but writing them in Javascript is an interesting idea. I can definitely appreciate the flexibility there. And having good replication built-in solves one of my big needs.
I'm sure my thinking will have evolve after I've loaded a few hundred million documents in, but so far I'm really liking it. The CPAN modules in Net::CouchDb do a pretty good job and get you up and running quickly. I had a knee-jerk response to tweak a few things there but quickly realize that they're far from being the bottleneck anyway.
It seems that without any tuning or fancy work, I can get about 75-100 inerts/sec on my desktop class Ubuntu box (Intel Core 2 Duo, 2.66GHz, 1GB RAM, single 80GB SATA disk). That's not bad for out-of-the-box performance. And doing the math on space used for a document set (after compaction), I'm seeing roughly ~3KB/doc. That's a bit more than I expected but really not bad at all.
I wonder if there's a future for gzip compression in CouchDB. Or maybe we should just use ZFS...
Posted by jzawodn at February 10, 2009 10:58 AM
Try Tokyo Cabinet, it's incredible. We [soitu.es] are stressing it and, wow!, today, with good starting paramerters, we got better times than memcached.
Hi,
well, if you mention Tokyo Cabinet, then you should also look at Lux IO: http://luxio.sourceforge.net/
But I think CouchDB is much more interesting than both of those: the simple API and the incremental map-reduce views are a killer feature.
Best regards,
Hi Jeremy,
look at the bulk update feature that lets you insert a bunch of documents at once. It should give you a little better insert rate. I don't know if that API is exposed through Net::CouchDb, but then, it is just an HTTP request. Look for "Modify Multiple Documents With a Single Request" on http://209.85.129.132/search?q=cache:Iu4KlOaDfHYJ:wiki.apache.org/couchdb/HTTP_Document_API+HTTP_Document_API&hl=en&ct=clnk&cd=1 (All ASF wikis are down at the moment).
It also results in tighter packed database files which might save you the compaction step.
Cheers
Jan
--
Hi Jeremy,
My name is Doug Judd and I'm the original creator of Hypertable, a scalable database closely modelled after Google's Bigtable. It differs from traditional database technology in that the design emphasis is on scalability on commodity hardware as opposed to one that assumes single machine. Some of the trade-offs that have been made include no support for ACID transactions or the relational model. We do plan to add transactions at some point post-1.0 and have laid the groundwork by implementing MVCC, each cell has a revision number. We support several different on-disk compression schemes (e.g. zlib, lzo, bmz, quicklz) and depending on the machine configuration, document size, etc. you could see an insert rate in the 10K to 30K inserts/s range (which scales linearly with cluster size). Our design philosophy has centered around optimum performance and to that end, the system is developed in C++ as opposed to a garbage collected language such as Java.
We're currently still in alpha and plan to go beta around summertime. If this technology interests you at all, we would be more than happy to give a Hypertable presentation to your engineering team.
- Doug
Hi jeremy,good to see you playing with CouchDB. I did set up the exact same thing, but shut it down due to time-reasons.I scraped the public timeline as well, stored it into CouchDB and let users define their own views through: