0% found this document useful (0 votes)
137 views15 pages

Sqoop

Sqoop allows importing and exporting data between relational databases and Hadoop. It uses MapReduce to import data from a relational database into HDFS, HBase, or Hive. For import, Sqoop first connects to the database and retrieves metadata, then executes a MapReduce job to import the data. It supports importing a full table, selected columns/rows, and incremental imports. Sqoop export works similarly but in reverse, using MapReduce to export data from HDFS, HBase, or Hive to a relational database in bulk.

Uploaded by

shobhit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
137 views15 pages

Sqoop

Sqoop allows importing and exporting data between relational databases and Hadoop. It uses MapReduce to import data from a relational database into HDFS, HBase, or Hive. For import, Sqoop first connects to the database and retrieves metadata, then executes a MapReduce job to import the data. It supports importing a full table, selected columns/rows, and incremental imports. Sqoop export works similarly but in reverse, using MapReduce to export data from HDFS, HBase, or Hive to a relational database in bulk.

Uploaded by

shobhit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Sqoop

Sqoop Import

Traditional RDBMS data into Hadoop, Hbase and HIVE.


Prerequisites:-
RDBMS
Hadoop Cluster up in running
Set HADOOP_HOME environment variable
Basic command

bin/sqoop import connect jdbc:mysql://url username name password pwd


table name target-dir path/for/storing/db
How import works?

First connection is set up to the Database server to pull desired metadata info
from the input table we are using.
Then it executes a Mapreduce job on Hadoop cluster. Sqoop will use
metadata to perform actual import.
Modify Delimiters

--fields-terminated-by ,
--lines-terminated-by ,
--escaped-by \\
--enclosed-by \
Different file formats

--as-sequencefile Store data in sequential file format


--as-avrodatafile Store data in Avro file
--as-textfile Store data in Text file

--direct Direct Access Mode for non jdbc based access


Different table access

--columns field1, field2 Import selected columns


--where condition Import selected rows
--columns fields where cond. Selected rows of selected columns
--query any query For any SQL query
import-all-tables For importing all tables

-m no. No. of map tasks


--split-by column_name For dividing mapped tasks
Incremental import

For importing new version/latest record


For appending new recods
--incremental append last-value value check-column column_name
For appending and updating records
--incremental lastmodified last-value value(timestamp) check-column
column_name
(Will need to maintain timestamp, so an extra column)
Job info

--create job_name
--delete job_name
--exec job_name
--show job_name Show parameters
--list List of all saved jobs
Importing data in Hbase

Prerequisites:-
Hbase cluster up in running
HBASE_HOME environment variable is set
For importing a Primary key table
bin/sqoop import connect jdbc:mysql://url username name password pwd
table name hbase-table hbase_name column-family hbase_table_col1
hbase-create-table
For importing a non-primary key table
bin/sqoop import connect jdbc:mysql://url username name password pwd
table name hbase-table hbase_name column-family hbase_table_col1
hbase-row-key col_name hbase-create-table
Importing database in HIVE

Prerequisites:-
HIVE installed
HIVE_HOME environment variable is set
Importing primary key table
bin/sqoop import connect jdbc:mysql://url username name password pwd
table name hive-table name create-hive-table hive-import hive-home
path/to/hive/home
Importing non-primary key table
bin/sqoop import connect jdbc:mysql://url username name password pwd
table name hive-table name create-hive-table hive-import hive-home
path/to/hive/home split-by col_name
Getting HDFS data into HIVE

Hive> CREATE EXTERNAL TABLE student(id int, name string)


ROW FORMAT DELIMITED FIELDS TERMINATED BY ',
LINES TERMINATED BY '\n
STORED AS TEXTFILE
LOCATION '/user/username/student';
Sqoop export

Basic command:
Bin/sqoop export connect location table name username name password
pwd export-dir /location
--input-fields-terminated-by,
--input-lines-terminated-by,
How export works

Validate metadata of output RDBMS table


Execute the Mapreduce job to perform actual transfer

Use staging-table argument to move staged data in single transaction


Export from HIVE

Create an invoice table as


CREATE TABLE invoice(
id INT NOT NULL PRIMARY KEY
from VARCHAR(32), to VARCHAR(32));
Use command:-
bin/sqoop export connect jdbc:Location table invoice export-dir
Location/invoice username name password pwd m no. input-fields-
terminated-by\001(Octal of ^A)

You might also like