Detailed steps for building a pseudo-distributed environment in Hadoop

This article introduces the entire process of building a pseudo-distributed environment in a combination of pictures and texts for your reference. The specific content is as follows

1. Modify,

method:Open these three files using notepad++ (beifeng user)

Add code:export JAVA_HOME=/opt/modules/jdk1.7.0_67

2. Modify、、、Configuration File

1) Modify

<configuration>
<property>
<name></name>
<value>hdfs://:8020</value>
</property>
<property>
<name></name>
<value>/opt/modules/hadoop-2.5.0/data</value>
</property>
</configuration>

2) Modify

<configuration>
<property>
<name></name>
<value>1</value>
</property>
<property>
<name>-address</name>
<value>:50070</value>
</property>
</configuration>

3) Modify

<configuration>
<property>
<name>-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name></name>
<value></value>
</property>
<property>
<name>-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>-seconds</name>
<value>86400</value>
</property>
</configuration>

4) Modify

<configuration>
<property>
<name></name>
<value>yarn</value>
</property>
<property>
<name></name>
<value>0.0.0.0:19888</value>
</property>
</configuration>

3. Start hdfs

1) Format namenode:$ bin/hdfs namenode -format

2) Start namenode:$sbin/ start namenode

3) Start datanode:$sbin/ start datanode

4) hdfs monitoring web page::50070

4. Start yarn

1) Start resourcemanager:$sbin/ start resourcemanager

2) Start nodemanager:sbin/ start nodemanager

3) Yarn monitoring web page::8088

5. Test the wordcount jar package

1) Positioning path: /opt/modules/hadoop-2.5.0

2) Code test: bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5. wordcount /input/ /output6/

Running process:

16/05/08 06:39:13 INFO : Connecting to ResourceManager at /192.168.241.130:8032
16/05/08 06:39:15 INFO : Total input paths to process : 1
16/05/08 06:39:15 INFO : number of splits:1
16/05/08 06:39:15 INFO : Submitting tokens for job: job_1462660542807_0001
16/05/08 06:39:16 INFO : Submitted application application_1462660542807_0001
16/05/08 06:39:16 INFO : The url to track the job: :8088/proxy/application_1462660542807_0001/
16/05/08 06:39:16 INFO : Running job: job_1462660542807_0001
16/05/08 06:39:36 INFO : Job job_1462660542807_0001 running in uber mode : false
16/05/08 06:39:36 INFO : map 0% reduce 0%
16/05/08 06:39:48 INFO : map 100% reduce 0%
16/05/08 06:40:04 INFO : map 100% reduce 100%
16/05/08 06:40:04 INFO : Job job_1462660542807_0001 completed successfully
16/05/08 06:40:04 INFO : Counters: 49

3)Result view: bin/hdfs dfs -text /output6/par*

Running results:

hadoop 2
jps 1
mapreduce 2
yarn 1

6. MapReduce History Server

1) Start: sbin/ start historyserver

2) Web ui interface::19888

7. HDFs, yarn, mapreduce functions

1）hdfs：Distributed file system, a high fault tolerance file system, suitable for deployment on cheap machines.

hdfsIt is a master-slave structure, divided into namenode and datanode, where namenode is the namespace, datanode is the storage space, datanode is stored in the form of data blocks, each data block is 128M

2）yarn：A general resource management system that provides unified resource management and scheduling for upper-level applications.

yarnIt is divided into resourcemanager and nodemanager. Resourcemanager is responsible for resource scheduling and allocation, and nodemanager is responsible for data processing and resources.

3）mapreduce：MapReduce is a computing model divided into Map (map) and Reduce (reduction).

mapAfter processing each row of data, it appears in the form of a key-value pair and passes it to reduce; reduce summarizes the data passed by the map and counts.

The above is all about this article, I hope it will be helpful to everyone's learning.