30 June 2011
這篇主要教導大家如何在ubuntu上架起single-node的hadoop。
會在其他篇教導大家如何架設multi-node。
因為架設multi-node的前提是先得架設single-node。
所以先教導大家如何架設。
至於如何執行一個MapReduce的project將在下一篇教導大家。

我的環境如下:
ubuntu版本:10.10
hadoop:0.20.2



【1】要架設hadoop起來的前提是必須安裝Sun的java-jdk
     【1.1】首先先增加Canonical Partner Repository到我們的repositories,語法如下:
sudo add-apt-repository "deb http://archive.canonical.com/ lucid partner"
     【1.2】更新我們的source.list,語法如下:
sudo apt-get update
     【1.3】安裝sun java sdk,語法如下:
sudo apt-get install sun-java6-jdk
     【1.4】因為linux的jdk預設為open-jdk,所以語法如下:


【2】建立一個Hadoop系統的專屬使用者
sudo addgroup hadoop
sudo adduser --ingroup hadoop hadoop


【3】設定SSH,因為hadoop利用ssh去管理node
     【3.1】首先先產生一個SSH key給hadoop user,語法如下:
su - hadoop
ssh-keygen -t rsa -P ""  
     【3.2】利用新產生的key去accees本機上的機器,語法如下:
su - hadoop
cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys


【4】關閉IPv6
     【4.1】利用vim去開啟設定檔,語法如下:
vim /etc/sysctl.conf 
     【4.2】將檔案中的設定改如下:
#disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1



上面都是為了安裝Hadoop的先前作業
下面是正式進行下載以及安裝Hadoop


【5】下載Hadoop並且解壓縮
cd /usr/local
wget http://apache.ntu.edu.tw/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
tar zxvf hadoop-0.20.2.tar.gz
sudo mv hadoop-0.20.2 hadoop
sudo chown -R hadoop:hadoop hadoop


【6】更新$HOME/.bashrc(注意!是以hadoop身分執行語法)
     【6.1】利用vim去開啟.bashrc,語法如下:
sudo vim $HOME/.bashrc 
     【6.2】將檔案中的設定更改成如下:
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin


【7】Hadoop的設定
      接著是一連串的Hadoop設定檔,設定檔的位置都位於/usr/local/hadoop/conf/底下,
      都是先利用vim去開啟設定檔案做編輯,故下面就省略vim的語法
     【7.1】hadoop-env.sh,語法如下:
           把預設的JAVA_HOME位置更改掉
# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun 
     【7.2】core-site.xml,語法如下:
           紀錄Hadoop將資料存在哪
           先建立一個Hadoop的暫存資料夾,並設定權限及擁有人
sudo mkdir /app/hadoop/tmp
sudo chown hadoop:hadoop /app/hadoop/tmp
sudo chmod 750 /app/hadoop/tmp 
           接著就是打開設定檔更改設定,注意!請填自己的ip,改成如下:

hadoop.tmp.dir
/app/hadoop/tmp
A base for other temporary directories.



fs.default.name
hdfs://140.133.xxx.xxx:54310
The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation. The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class. The uri's authority is used to
determine the host, port, etc. for a filesystem.



      【7.3】mapred-site.xml,語法如下:
          打開設定檔更改設定,注意!請填自己的ip,改成如下:

mapred.job.tracker
140.133.xxx.xxx:54311
The host and port that the MapReduce job tracker runs
at. If "local", then jobs are run in-process as a single map
and reduce task.



     【7.4】hdfs-site.xml,語法如下:
           打開設定檔更改設定,此檔案紀錄著檔案被複製的數量,改成如下:

dfs.replication
1
Default block replication.
The actual number of replications can be specified when the file is created.
The default is used if replication is not specified in create time.





【8】對我們的namenode格式化
/usr/local/hadoop/bin/hadoop namenode -format

出來的output應該會如下:
11/06/30 16:59:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = ubuntu/140.133.xxx.xxx
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hadoop,hadoop
10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds.
10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hadoop/dfs/name has been successfully formatted.
10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at ubuntu/140.133.xxx.xxx
************************************************************/




【9】啟動我們的single-node cluster
/usr/local/hadoop/bin/start-all.sh


這將會啟動機器上的Namenode, Datanode, Jobtracker and a Tasktracker!!
出來的output應該會如下:
starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-namenode-ubuntu.out
localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-datanode-ubuntu.out
localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-ubuntu.out
starting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-jobtracker-ubuntu.out
localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hadoop-tasktracker-ubuntu.out


如果要檢查是否有啟動成功,可以打入下列語法:
jps

正確出來的output應該會如下:
2287 TaskTracker
2149 JobTracker
1938 DataNode
2085 SecondaryNameNode
2349 Jps
1788 NameNode















blog comments powered by Disqus