Skip to content

shreyas70/MapReduce

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

83 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Building the MapReduce framework

This project is a custom implementation of the Map Reduce framework. This framework is a programming model which facilitates performing parallel and distributed processing on huge data sets. Hadoop is the most popular implementation of Map Reduce.

If your task was to count the frequency of each unique node in a file of 5GB using a single computer, how much time would it require? Map Reduce framework allows us to leverage the multiple computing nodes to accomplish much, MUCH faster.

If you want to briefy understand the underlying concepts of this framework, you may go through this article.

Demo videos

Demo1: Running WordCount task on a simple file

Demo2: Demonstrating the Fault Tolerance feature

Architecture


Sequence of operations

(0) Client uploads input file(s) to FileServer
(1) Client initiates task request to Master
(2) Master accesses meta data of the input file from the FileServer
(3) Master assigns chunks of input file(s) to Mapper Nodes
(4) Mapper Node(s) download only the part(s) of input file(s) assigned to it
(5) Mapper Nodes finish processing and upload their resulting files to FileServer
(6) Mapper Node(s) inform Master that its task is done and the resulting files are uploaded to the FileServer
(7) Master assigns tasks to Reducer Node(s)
(8) Reducer Node(s) download files relevant to the assigned task
(9) Reducer Node(s) uploads resulting file to FileServer
(10) Reducer Node(s) inform master that task is done
(At this point Master assigns the task of aggregating the resulting files of all Reducer Nodes to one Reducer Node)
(11) Master informs the client that processing is done and the output file is present on the FileServer
(12) Client downloads output file from the FileServer

How to run

make all
./fs_server
./master_server <master_IP> <log_file>
./mapper_node <master_IP> <mapper_IP>
./reducer_node <master_IP> <reducer_IP>
./dummy_client <master_IP>

Span as many mappers/reducers as needed. Two sample tasks are implemented, word count and inverted document index. The framework implementation can be tested as on these two tasks. The syntax for the same can be found in dummy_client.cpp file.

About

Custom implementation of the Map Reduce framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published