hdfs dfs -getmerge command in Hadoop is used to merge multiple files stored in HDFS (Hadoop Distributed File System) into a single output file and place that file into the local file system. This is useful when:
- You have multiple small files in HDFS and want to combine them into one.
- You want to fetch processed output files from HDFS to your local system in a single file for further use.
Example : Suppose we have two files in HDFS:
- file1.txt
- file2.txt
We want to merge them into a single file named output.txt in our local file system.
Step 1: Check Content of Files
Before merging, let’s see the content of both files that are available in HDFS.
Content of file1.txt

Content of file2.txt

We will merge these two files into one.
Step 2: Create a Directory in HDFS
First, create a directory in HDFS (e.g., /Hadoop_File) where we will store our files:
hdfs dfs -mkdir /Hadoop_File
Step 3: Copy Files from Local to HDFS
Copy both file1.txt and file2.txt from the local system to HDFS:
hdfs dfs -copyFromLocal /home/dikshant/Documents/hadoop_file/file1.txt /Hadoop_File
hdfs dfs -copyFromLocal /home/dikshant/Documents/hadoop_file/file2.txt /Hadoop_File

Now both files are inside the /Hadoop_File directory in HDFS. You can verify this by listing the files:
hdfs dfs -ls /Hadoop_File

Step 4: Syntax of -getmerge Command
hdfs dfs -getmerge [-nl] <source_path1> <source_path2> ... <local_destination_file>
- -nl: Adds a new line between the contents of files being merged.
- <source_path> : The files in HDFS to merge.
- <local_destination_file>: Path in the local file system where the merged file will be created.
Step 5: Merge Files Using -getmerge
Now merge file1.txt and file2.txt from HDFS into a single file output.txt in the local system:
hdfs dfs -getmerge -nl /Hadoop_File/file1.txt /Hadoop_File/file2.txt /home/dikshant/Documents/hadoop_file/output.txt
Step 6: Verify the Output
Check whether the files have been merged successfully:
cd /home/dikshant/Documents/hadoop_file
ls
cat output.txt
You should now see the combined content of both files, with a newline separating them (because we used -nl).

Key Points to Remember
- If you omit -nl, the contents of files will be merged without newlines, which may cause data overlap.
- You can merge an entire directory instead of individual files:
hdfs dfs -getmerge -nl /Hadoop_File /home/dikshant/Documents/hadoop_file/output.txt
- This will merge all files inside /Hadoop_File into output.txt.
- The merged file is always stored in the local file system, not back in HDFS.