MapReduce Flow Chart Sample Example

In this mapreduce tutorial we will explain mapreduce sample example with its flow chart. How to work mapreduce for a job.

A SIMPLE EXAMPLE FOR WORD COUNT
To overcome listed above problems into some line using mapreduce program. Now we look into below mapreduce function for understanding how to its work on large dataset.

MAP FUNCTION
The emitted word, 1 will from the List that is output from the mapper

So who take ensuring the file is distributed and each line of the file is passed to each of the map function?-Hadoop Framework take care about this, no need to worry about the distributed system.

REDUCE FUNCTION
Reduce(Key2, List(Value2)) --> List(Key3, Value3)

For the List(key, value) output from the mapper Shuffle and Sort the data by key
Group by Key and create the list of values for a key
So who is ensuring the shuffle, sort, group by etc?

MAP FUNCTION FOR WORD COUNT
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
While(tokenizer.hasMoreTokens()){

word.set(tokenizer.nextToken());
context.write(word, one);
}
}

REDUCE FUNCTION FOR WORD COUNT
public void reduce(Text key, Iterable <IntWritable> values, Context context) throws IOException, InterruptedException{

int sum = 0;
for(IntWritable val : values){
sum += val.get();
}
context.write(key, new IntWritable(sum));
}

ANATOMY OF A MAPREDUCE PROGRAM

map-reduce

FLOW CHART OF A MAPREDUCE PROGRAM
Suppose we have a file with size about 200 MB, suppose content as follows

-----------file.txt------------
_______File(200 MB)____________
hi how are you
how is your job (64 MB) 1-Split
________________________________
-------------------------------
________________________________
how is your family
how is your brother (64 MB) 2-Split
________________________________
-------------------------------
________________________________
how is your sister
what is the time now (64 MB) 3-Split
________________________________
-------------------------------
_______________________________
what is the strength of hadoop (8 MB) 4-Split
________________________________
-------------------------------

In above file we have divided this file into 4 splits with sizes three splits with size 64 MB and last fourth split with size 8 MB.

Input File Formats:
----------------------------
1. TextInputFormat
2. KeyValueTextInputFormat
3. SequenceFileInputFormat
4. SequenceFileAsTextInputFormat
------------------------------

mapreduce-flow-chart


Lets see in another following figure to understand the process of MAPREDUCE.

mapreduce-process






Labels: