hadoop Ever sine I wrote Open Source Distributed Computing: Yahoo's Hadoop Support back in July, interest in Hadoop and Yahoo's work has been on the rise. So I started to get to know the Hadoop team at Yahoo a bit better and help them figure out how to tell more of the story.

We decided that it'd make sense to have a new blog on the Yahoo! Developer Network where we can collect & post news, tips, announcements, videos, and anything else related to Hadoop and distributed computing work.

To kick off the blog, which we're calling Hadoop and Distributed Computing at Yahoo!, I sat down with Eric Baldeschwieler ("Eric14") to do a video interview about Hadoop and Yahoo's involvement.


download video

The more I dig into Hadoop the other projects emerging around it, the more I'm reminded of the early days of MySQL and the maturing of the LAMP stack. It's an exciting time to get involved. You can expect to hear more from me on this topic, both here and on the YDN Hadoop blog.

Posted by jzawodn at November 14, 2007 11:31 AM

Reader Comments
# David Ulevitch said:

All I keep wondering is why on earth it's written in Java.

Getting a stable Java stack running on FreeBSD or Linux has always been a huge PITA.

on November 14, 2007 12:59 PM
# Jeremy Zawodny said:

Is that really still a problem in 2007?!

Yikes.

I know that FreeBSD has had issues in the past, but I fighred that at least Linux was reasonable enough now.

on November 14, 2007 01:01 PM
# Doug Cutting said:

On Ubuntu, Java is installed with:

sudo apt-get install sun-java6-jdk

FreeBSD has Java (that I've not used):

http://www.freebsd.org/java/

Why Java? Java's a good platform for complex, long-lived daemons, it has a good set of libraries, it's very portable, and it's fast. The performance of the Hadoop kernel is almost entirely governed by network and disk i/o. One can also get a bit better performance by giving Hadoop more memory. But Java performance is not a significant bottleneck.

That said, Hadoop client programs don't have to be written in Java. One can write Hadoop MapReduce programs using shell commands, C++, Python, etc.

on November 14, 2007 02:39 PM
# john said:

Sorry to be nit-picky Jeremy, but you have a spelling mistake at the end of the first paragraph:

"how to tell more of he story."

Should be?

"how to tell more of the story."

I use Sun Java on Linux all of the time (Fedora and RHEL), while working on an open source system called DSpace, and I find it much more stable than Java on Windows. It can work really well, but I agree with David Ulevitch in that it can be a real pain to set up correctly. And as for building and deploying Java-based systems, don't get me started...

on November 15, 2007 01:59 AM
# Jeremy Zawodny said:

Fixed.

on November 15, 2007 02:39 AM
# Rebecca Rachmany said:

Pasting HTML code is easy? That doesn't sound easy to most people.

on November 20, 2007 07:53 AM
Disclaimer: The opinions expressed here are mine and mine alone. My current, past, or previous employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist or reporter. Don't "pitch" me.

 

Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain.