Questions tagged [sqoop]

Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the imports and exports are performed in parallel.

Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the imports and exports are performed in parallel.

You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

Available Sqoop commands:

  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import mainframe datasets to HDFS
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  version            Display version information

Sqoop has been a Top-Level Apache project since March of 2012.

References

Related Tags

2610 questions
10
votes
4 answers

How to use sqoop to export the default hive delimited output?

I have a hive query: insert override directory /x select ... Then I'm try to export the data with sqoop sqoop export --connect jdbc:mysql://mysqlm/site --username site --password site --table x_data --export-dir /x --input-fields-terminated-by…
Julias
  • 5,752
  • 17
  • 59
  • 84
10
votes
3 answers

Can Sqoop export create a new table?

It is possible to export data from HDFS to RDBMS table using Sqoop. But it seems like we need to have existing table. Is there some parameter to tell Sqoop do the 'CREATE TABLE' thing and export data to this newly crated table? If yes, is it going…
Bohdan
  • 16,531
  • 16
  • 74
  • 68
9
votes
1 answer

Sqoop: Importing from SQL Server throwing "The TCP/IP connection to the host x.x.x.x, port 1433 has failed" during map tasks

On HDP 2.3.2 with Sqoop 1.4.6, I'm trying to import tables from SQL Server 2008. I'm able to successfully connect to the SQL Server because I can list databases and tables etc. However, every single time during imports I run into the following…
Ton Torres
  • 1,509
  • 13
  • 24
9
votes
7 answers

sqoop import multiple tables

We are using Cloudera CDH 4 and we are able to import tables from our Oracle databases into our HDFS warehouse as expected. The problem is we have 10's of thousands of tables inside our databases and sqoop only supports importing one table at a…
Danny Westfall
  • 93
  • 1
  • 1
  • 3
9
votes
1 answer

Where is the sqoop library directory?

To install the MySQL connector in Sqoop I need to put the jar file in the Sqoop directory but I cannot find it (it is not in /usr/lib/sqoop). I installed Sqoop with Cloudera on multiple machines. Where can I find the Sqoop directory on one of the…
Mad Echet
  • 3,723
  • 7
  • 28
  • 44
8
votes
2 answers

overwrite hdfs directory Sqoop import

Is it possible to overwrite HDFS directory automatically instead of overwriting it every time manually while Sqoop import? (Do we have any option like "--overwrite" like we have for hive import "--hive-overwrite")
Abhinav Singh
  • 109
  • 1
  • 2
  • 10
8
votes
4 answers

Delta/Incremental Load in Hive

I have the use case below : My application has a table having multiyear data in RDBMS DB. We have used sqoop to get data into HDFS and have loaded into hive table partitioned by year, month. Now, the application updates, and inserts new records…
jigarshah
  • 410
  • 2
  • 8
  • 20
7
votes
3 answers

Is it possible to read MongoDB data, process it with Hadoop, and output it into a RDBS (MySQL)?

Summary: Is it possible to: Import data into Hadoop with the «MongoDB Connector for Hadoop». Process it with Hadoop MapReduce. Export it with Sqoop in a single transaction. I am building a web application with MongoDB. While MongoDB work well…
paganotti
  • 5,591
  • 8
  • 37
  • 49
7
votes
1 answer

Showing wrong count after importing table in Hive

I have imported near about 10 tables in Hive from MS SQL Server. But when I try to cross check the records in Hive in one of the Table I have found more record when I run the query (select count(*) from tblName;). Then I have drop the that Table and…
Bhavesh Shah
  • 3,299
  • 11
  • 49
  • 73
7
votes
5 answers

Function min(uuid) does not exist in postgresql

I have imported tables from Postgres to hdfs by using sqoop. My table have uuid field as primary key and my command sqoop as below: sqoop import --connect 'jdbc:postgresql://localhost:5432/mydb' --username postgreuser --password 123456abcA --driver…
hazzy
  • 167
  • 1
  • 2
  • 9
7
votes
1 answer

Loading data from RDBMS to Hadoop with multiple destinations

We have implemented a solution using Sqoop to load data from RDBMS to our hadoop cluster, for append-only data, it goes to hive while dimension data to hbase. Now we are setting up two identical Hadoop clusters, they are the backup cluster for each…
Shengjie
  • 12,336
  • 29
  • 98
  • 139
7
votes
5 answers

What is --direct mode in sqoop?

As per my understanding sqoop is used to import or export table/data from the Database to HDFS or Hive or HBASE. And we can directly import a single table or list of tables. Internally mapreduce program (i think only map task) will run. My doubt is…
Raj
  • 537
  • 4
  • 9
  • 18
7
votes
1 answer

Sqoop - Binding to YARN queues

So with mapreduce v2 you can use binding to certain YARN queues to manage resources and prioritization. Basically by using "hadoop jar /xyz.jar -D mapreduce.job.queuename=QUEUE1 /input /output" which works perfectly. How can integrate Yarn queue…
user2562618
  • 327
  • 6
  • 14
7
votes
6 answers

Using Sqoop to import data from MySQL to Hive

I am using Sqoop (version 1.4.4) to import data from MySQL to Hive. The data will be a subset of one of tables, i.e. few columns from a table. Is it necessary to create table in Hive before hand. Or importing the data will create the name specified…
Nayan
  • 353
  • 3
  • 5
  • 16
7
votes
2 answers

Apache Sqoop/Pig Consistent Data Representation/Processing

In our organization, we have been trying to use hadoop ecosystem based tools to implement ETLs lately. Although the ecosystem itself is quite big, we are using only a very limited set of tools at the moment. Our typical pipeline flow is as…
srikrishna
  • 238
  • 3
  • 11