Thursday 2 November 2017

Sqoop Import with Secure Password

Sqoop is a very popular tool to import data from RDMS e.g: Oracle, MySQL, DB2 to HDFS or you can export data from HDFS to RDMS.

Problem definition: In Sqoop import/export we need to provide the RDBMS credentials. How do we secure the password to restrict the access of unauthorized users?

Solution:

1. Using hadoop credential-provider API.

Here are the steps:

hadoop credential create oracle.john -provider jceks://hdfs/user/test/sqoop/pass.jceks

-- Enter the RDMS password

After that, you will get the success message as follows -
oracle.john -provider has been successfully created.
Provider jceks://hdfs/user/test/sqoop/pass.jceks has been updated.

Now in Sqoop import command, you need to pass the password as follows:

-Dhadoop.security.credential.provider.path=jceks://hdfs/user/test/sqoop/pass.jceks \
--connect jdbc:oracle:thin:@server:1521:xyz  \
--username use_name \
--password-alias oracle.john \
--table db_name.table_name \
--mapreduce-job-name table_name_LOAD \
--delete-target-dir \
--fields-terminated-by '\001' \
--null-string "" \
--null-non-string "" \
--hive-drop-import-delims \
--escaped-by "\\" \
--split-by "col_name" \
--num-mappers 1

2. Using Option file.

Before starting the sqoop process we can generate option file and after completion of the job we can delete the option file.

Thanks a lot for the read.

Regards,
Ratan Kumar Nath
Email:ratanKrNath@gmail.com