Build your own Data Cloud – QMUL

UPLOAD SLIDES

MapReduce / Hadoop on RaspberryPi

If your data is processed once, maybe not keep it in the HFDS as it is slow(?) check. HDFS (Hadoop Distributed File SYstem).

Name Nodes store the metadata
Data Nodes store data (nodes are replicated a number of times (3 usually)

In nano hadoop-2.6.0/etc/hadoop/hadoop-env.sh (configuration file)
Edit the java implementation to use.
export JAVA_HOME=${JAVA_HOME}
to
export JAVA_HOME=”/home/pi/ejdk1.8.0_33/linux_armv6_vfp_hflt/jre/” [or equiv]

edit core-site.xml

fs.defaultFS
hdfs://pi-0:9000 <--- this is the master node. you have to place the tag inside the

edit hdfs-site.xml
dfs.replication
1
dfs.namenode.name.dir
/home/pi/had-hdfs/
dfs.datanode.data.dir
/home/pi/had-hdfs/

edit mapred-site.xml <--- may already be correctly edited mapreduce.framework.name
yarn

Yarn
– hadoop-2.6.0/etc/hadoop/yarn-site.xml

yarn.nodemanager.aux-services mapreduce_shuffle yarn.resourcemanager.hostname
localhost
<---- localhost uses localhost as master. for NME ask ITS for the master IP on their Hadoop set-up  You need to start Hadoop process form the Master node... You need to start Yarn process form the Master node... [so both of these will be running at QMUL already] This was a SHIT workshop. to unzip tars in command line: tar xzvf [filename]