Help

This isn’t intensely helpful, but it’s something. Quick start guide

Interesting and useful perspective on Hadoop in HPC (or not) by an SDSC guy.

Dependencies

Hadoop itself

I got Hadoop from some mirrored place like this.

Just get the tar.gz. I messed around way too long with the RPM and God knows what mischief it got up to.

Java

Did a yum install java and seem to have received java-1.5.0-gcj. This could be a problem since the instructions ask for: "JavaTM 1.6.x, preferably from Sun, must be installed."

So over at the evil Oracle site they’re promising free blow jobs with:

jre-6u29-linux-x64-rpm.bin

which unzips to:

jre-6u29-linux-amd64.rpm

Then I think you need to do something like this:

export JAVA_HOME=/usr/java/jre1.6.0_29/

And installs to a mess that doesn’t quite work as far as I can tell.

Note that on modern CentOS 6 systems (and probably others for all I know) there is a sketchy thing called alternatives. Find your Java’s real location with something like:

alternatives --config java

SSH

SSH needs to be present and working. If it’s not already, you probably have no business doing serious things with your computing hardware.

Installing and Testing

Unzip and untar the hadoop package. Should unpack ok and you can test right from that directory. It may not be ideal, but I put it in /usr/local/src/hadoop

So in the Hadoop directory that you just unpacked, do this:

$ export JAVA_HOME=/usr/lib/jvm/jre-1.6.0-openjdk.x86_64/
$ cd ~/sometestdirectory
$ mkdir input
$ cp /usr/local/src/hadoop-1.0.4/conf/*xml input/
$ /usr/local/src/hadoop-1.0.4/bin/hadoop \
jar /usr/local/src/hadoop-1.0.4/hadoop-examples-1.0.4.jar \
grep input output 'dfs[a-z.]+'

If the Java is happy it should run and process an example. Hopefully there are no errors or sad exit codes. A directory called output should appear and "results" can be checked with:

$ cat output/*

Apache Pig

Look for it somewhere like this repo.

Maven