Ever sine I wrote Open Source Distributed Computing: Yahoo's Hadoop Support back in July, interest in Hadoop and Yahoo's work has been on the rise. So I started to get to know the Hadoop team at Yahoo a bit better and help them figure out how to tell more of the story.
We decided that it'd make sense to have a new blog on the Yahoo! Developer Network where we can collect & post news, tips, announcements, videos, and anything else related to Hadoop and distributed computing work.
To kick off the blog, which we're calling Hadoop and Distributed Computing at Yahoo!, I sat down with Eric Baldeschwieler ("Eric14") to do a video interview about Hadoop and Yahoo's involvement.
The more I dig into Hadoop the other projects emerging around it, the more I'm reminded of the early days of MySQL and the maturing of the LAMP stack. It's an exciting time to get involved. You can expect to hear more from me on this topic, both here and on the YDN Hadoop blog.
Posted by jzawodn at November 14, 2007 11:31 AM
All I keep wondering is why on earth it's written in Java.
Getting a stable Java stack running on FreeBSD or Linux has always been a huge PITA.
Is that really still a problem in 2007?!
Yikes.
I know that FreeBSD has had issues in the past, but I fighred that at least Linux was reasonable enough now.
On Ubuntu, Java is installed with:
sudo apt-get install sun-java6-jdk
FreeBSD has Java (that I've not used):
Why Java? Java's a good platform for complex, long-lived daemons, it has a good set of libraries, it's very portable, and it's fast. The performance of the Hadoop kernel is almost entirely governed by network and disk i/o. One can also get a bit better performance by giving Hadoop more memory. But Java performance is not a significant bottleneck.
That said, Hadoop client programs don't have to be written in Java. One can write Hadoop MapReduce programs using shell commands, C++, Python, etc.
Sorry to be nit-picky Jeremy, but you have a spelling mistake at the end of the first paragraph:
"how to tell more of he story."
Should be?
"how to tell more of the story."
I use Sun Java on Linux all of the time (Fedora and RHEL), while working on an open source system called DSpace, and I find it much more stable than Java on Windows. It can work really well, but I agree with David Ulevitch in that it can be a real pain to set up correctly. And as for building and deploying Java-based systems, don't get me started...
Pasting HTML code is easy? That doesn't sound easy to most people.