GitHub - shreyas70/MapReduce: Custom implementation of the Map Reduce framework

Building the MapReduce framework

This project is a custom implementation of the Map Reduce framework. This framework is a programming model which facilitates performing parallel and distributed processing on huge data sets. Hadoop is the most popular implementation of Map Reduce.

If your task was to count the frequency of each unique node in a file of 5GB using a single computer, how much time would it require? Map Reduce framework allows us to leverage the multiple computing nodes to accomplish much, MUCH faster.

If you want to briefy understand the underlying concepts of this framework, you may go through this article.

Demo videos

Demo1: Running WordCount task on a simple file

Demo2: Demonstrating the Fault Tolerance feature

Architecture

Sequence of operations

(0) Client uploads input file(s) to FileServer
(1) Client initiates task request to Master
(2) Master accesses meta data of the input file from the FileServer
(3) Master assigns chunks of input file(s) to Mapper Nodes
(4) Mapper Node(s) download only the part(s) of input file(s) assigned to it
(5) Mapper Nodes finish processing and upload their resulting files to FileServer
(6) Mapper Node(s) inform Master that its task is done and the resulting files are uploaded to the FileServer
(7) Master assigns tasks to Reducer Node(s)
(8) Reducer Node(s) download files relevant to the assigned task
(9) Reducer Node(s) uploads resulting file to FileServer
(10) Reducer Node(s) inform master that task is done
(At this point Master assigns the task of aggregating the resulting files of all Reducer Nodes to one Reducer Node)
(11) Master informs the client that processing is done and the output file is present on the FileServer
(12) Client downloads output file from the FileServer

How to run

make all
./fs_server
./master_server <master_IP> <log_file>
./mapper_node <master_IP> <mapper_IP>
./reducer_node <master_IP> <reducer_IP>
./dummy_client <master_IP>

Span as many mappers/reducers as needed. Two sample tasks are implemented, word count and inverted document index. The framework implementation can be tested as on these two tasks. The syntax for the same can be found in dummy_client.cpp file.

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
input_files		input_files
src		src
standalone_scripts		standalone_scripts
temp_files		temp_files
README.md		README.md
makefile		makefile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Building the MapReduce framework

Demo videos

Demo1: Running WordCount task on a simple file

Demo2: Demonstrating the Fault Tolerance feature

Architecture

Sequence of operations

How to run

About

Uh oh!

Releases

Packages

Languages

shreyas70/MapReduce

Folders and files

Latest commit

History

Repository files navigation

Building the MapReduce framework

Demo videos

Demo1: Running WordCount task on a simple file

Demo2: Demonstrating the Fault Tolerance feature

Architecture

Sequence of operations

How to run

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages