Search results for 'linux/hadoop'

Setup Hadoop-0.19.2 on a Multiple Clusters

2010.10.13 20:27
System Information
Name/JobTracker Node: punky
Task/Data Node: infi17, client

Require Software
># apt-get install ssh rsync

Preparation
Edit the file conf/hadoop-env.sh to define JAVA_HOME to be the root of the Java installation
># vim conf/hadoop-env.sh
 # The java implementation to use.  Required
export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.20


1. Setup passwordless SSH connection
ykoh@punky: ># ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
ykoh@punky: ># ssh-copy-id -i ~/.ssh/id_dsa.pub ykoh@punky
ykoh@punky: ># ssh-copy-id -i ~/.ssh/id_dsa.pub ykoh@infi17
ykoh@punky: ># ssh-copy-id -i ~/.ssh/id_dsa.pub ykoh@client

2. Configuration
On NameNode, edit the conf/hadoop-site.xml like below
ykoh@punky: ># vim conf/hadoop-site.xml
 <configuration>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://punky:9000</value>
    </property>
    <property>
        <name>fs.default.name</name>
        <value>hdfs://punky:9000</value>
    </property>
    <property>
        <name>mapred.job.tracker</name>
        <value>punky:9001</value>
    </property>
    <property>
        <name>dfs.name.dir</name>
        <value>/home/ykoh/dfs/0.19.2/name</value>
    </property>
    <property>
        <name>dfs.data.dir</name>
        <value>/home/ykoh/dfs/0.19.2/data</value>
    </property>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

3. Edit master/slaves
># vim conf/master
 punky

># vim conf/slaves
 infi17
 client

Copy the conf/hadoop-site.xml to all datanodes 
* It's better to create a script to copy configuration to all datanodes since we do it quite often. 
ykoh@punky: ># vim bin/update-conf.sh
ykoh@punky: ># bin/update-conf.sh
#!/bin/bash

HADOOP_HOME="$HOME/hadoop-0.19.2-green"
SLAVES_LIST=`cat $HADOOP_HOME/conf/slaves`

echo "Update conf..."

for slave in $SLAVES_LIST
do
    echo "Executing the command on $slave..."
    echo "-----------------------------------------"
    scp -r $HADOOP_HOME/conf ykoh@$slave:$HADOOP_HOME
    echo "-----------------------------------------"
    echo ""
done



Environment Customization
* To make easier to execute hadoop every time, let's customize shell environments
># vim ~/.bashrc
 alias ha='cd ~/hadoop-0.19.2-green'

export PATH=$PATH:$HOME/hadoop-0.19.2-green/bin

># . ~/.bashrc

Execution (running the hadoop-0.19.2-examples.jar)
># hadoop namenode -format
># start-all.sh

Upload the input folder to DFS
># hadoop fs -put conf input
># hadoop fs -ls /user/ykoh/input
># hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

Trouble-Shooting
1. When uploading a file/folder to HDFS, the following error message appears on the screen and unable to upload the file/folder. There might be a lot of reasons, but here is what I need for our setup. 

You need to delete all temp folders (ex. dfs on all namenode and datanodes)
># rm -fr ~/dfs


References

저작자 표시
신고

'linux > hadoop' 카테고리의 다른 글

Setup Hadoop-0.19.2 on a Multiple Clusters  (0) 2010.10.13
DIsable Checksum on Hadoop  (0) 2009.11.08
Hadoop Cluster Setup  (0) 2009.09.22

Frank kenshin579 linux/hadoop hadoop setup

DIsable Checksum on Hadoop

2009.11.08 14:55
It's possible to disable verfication of checksums by passing false to setVerifyChecksum() on FileSystem.

LineRecordReader.java:

FileSystem fs = file.getFileSystem(job);
   
// [FRANK] disable checksum
DistributedFileSystem dfs=(DistributedFileSystem) fs;
dfs.setVerifyChecksum(false);

FSDataInputStream fileIn = fs.open(split.getPath());
저작자 표시
신고

'linux > hadoop' 카테고리의 다른 글

Setup Hadoop-0.19.2 on a Multiple Clusters  (0) 2010.10.13
DIsable Checksum on Hadoop  (0) 2009.11.08
Hadoop Cluster Setup  (0) 2009.09.22

Frank kenshin579 linux/hadoop

Hadoop Cluster Setup

2009.09.22 20:49

1. Edit /etc/hosts
192.168.252.134 namenode
192.168.252.136 datanode1
192.168.252.130 datanode2

2. Setup passphraseless ssh
># ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
># cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
># ssh-copy-id –i ~.ssh/id_dsa.pub ohf@datanode1
># ssh-copy-id –i ~.ssh/id_dsa.pub ohf@datanode2

3. Install JRE
># sh jdk-6*.bin
># sudo mkdir /opt/java ; sudo mv /etc/jdk* /opt/java
># ssh datanode1 sudo mkdir /opt/java
># ssh datanode2 sudo mkdir /opt/java
># ssh –r /opt/java ohf@datanode1:/
># ssh –r /opt/java ohf@datanode2:/

4. Edit hadoop-env.sh
># vi conf/hadoop-env.sh
export JAVA_HOME=/opt/java/jdk1.6.0_16

3. Edit Site-specific configuration

conf/core-site.xml

5. Edit master slave files
># cat master
namenode
># cat slaves
datanode
datanode

6. Start/Stop Hadoop service
* for the first time (format a new distributed filesystem)
># bin/hadoop name –format
># bin/start-dfs.sh
># bin/start-mapred.sh (start map-reduce)
># bin/stop-dfs.sh

># bin/stop-mapred.sh

References
1. Cluster Setup, http://hadoop.apache.org/common/docs/r0.20.1/cluster_setup.html
2.

신고

'linux > hadoop' 카테고리의 다른 글

Setup Hadoop-0.19.2 on a Multiple Clusters  (0) 2010.10.13
DIsable Checksum on Hadoop  (0) 2009.11.08
Hadoop Cluster Setup  (0) 2009.09.22

Frank kenshin579 linux/hadoop