Questions tagged [sqoop]

Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the imports and exports are performed in parallel.

Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the imports and exports are performed in parallel.

You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

Available Sqoop commands:

  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import mainframe datasets to HDFS
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  version            Display version information

Sqoop has been a Top-Level Apache project since March of 2012.

References

Related Tags

2610 questions
4
votes
3 answers

"Got error creating database manager" - Error in sqoop import query

Scenario: I am trying to import from sql server into HDFS, but I am getting errors as: Error: hadoop@ubuntu:~/sqoop-1.1.0/bin$ ./sqoop import --connect 'jdbc:sqlserver://192.168.230.1;username=xxx;password=xxxxx;database=HadoopTest' --table…
Bhavesh Shah
  • 3,299
  • 11
  • 49
  • 73
4
votes
1 answer

Sqoop import directly to S3 bucket from celery airflow worker

My bigdata infrastructure contains Airflow and EMR running in two separate clusters. Currently the data ETL steps are as follows, Sqoop data on to an Airflow worker (hadoop 2.7 is installed here in pseudo distributed mode) Sync data to S3 Access…
Rukshan Hassim
  • 505
  • 6
  • 15
4
votes
0 answers

How to Use Oracle Wallet with Sqoop Developer API

When using Sqoop from the command line, I can use Oracle Wallet as described in the ASF Blog like: export HADOOP_OPTS= "-Doracle.net.tns_admin=$PWD/wallet -Doracle.net.wallet_location=$PWD/wallet" sqoop import -D mapred.map.child.java.opts=…
4
votes
2 answers

Why spark is slower when compared to sqoop , when it comes to jdbc?

It is understood , while migrating/load from oracle db to hdfs/parquet , it is preferred to use SQOOP rather than SPARK with JDBC driver. Spark suppose to be 100x faster when processing right ? Then what is wrong with Spark ? Why people prefer…
BdEngineer
  • 2,929
  • 4
  • 49
  • 85
4
votes
2 answers

Sqoop Import to Hive hangs indefinitely at a point

I'm trying to import mysql table to Hive using Sqoop Import, however after the command execution, the CLI stays calm nothing happens and it hangs indefinitely. Below is the command and issue details.. [cloudera@quickstart bin]$ sqoop…
Ramakrishna
  • 53
  • 1
  • 8
4
votes
4 answers

why is the default maximum mappers are 4 in Sqoop? can we give more than 4 mappers in -m parameter?

I am trying to understand the reason behind the default maximum mappers in a sqoop job. Can we set more than four mappers in a sqoop job to achieve higher parallelism.
4
votes
3 answers

What does Sqoop 2 provide that Sqoop 1 does not?

According to sqoop.apache.org, Sqoop 2 is not feature complete and should not be used for production systems. Fair enough, some people may want to test out Sqoop 2's new features on their test environments. Cloudera has a feature comparison between…
Andrew C.
  • 410
  • 3
  • 10
4
votes
2 answers

sqoop job not running with parameters

I am trying to run a sqoop job .I am using sqoop version Sqoop 1.4.6-cdh5.8.0 and it is not working for this version It is working fine with Sqoop 1.4.5-cdh5.4.0. sqoop job --create E8 -- import --connect jdbc:mysql://localhost/test -- username…
coder25
  • 2,363
  • 12
  • 57
  • 104
4
votes
1 answer

How to escape single quote characters in bash command substitution

I want to generate a SQOOP command which is appended by some variable like CUSTOM_PARAMS. I have defined the variable in a file : say hi.cfg The variable have some single quotes as well like 'orc'. cat hi.cfg CUSTOM_PARAMS="--query select * from…
Abhi
  • 61
  • 7
4
votes
2 answers

sqoop not import datatype varchar2

sqoop not import datatype varchar2 to hadoop I have a table in oracle Database and I want import the data to hdfs. I am trying to do it with sqoop, but varchar2 columns are not imported. I mean that these data isn't arriving to hdfs file. my sqoop…
Mohamed Emad
  • 104
  • 1
  • 8
4
votes
0 answers

Sqoop export of a hive table partitioned on an int column

I have a Hive table partitioned on an 'int' column. I want to export the Hive table to MySql using Sqoop export tool. sqoop export --connect jdbc:mysql://XXXX:3306/temp --username root --password root --table emp --hcatalog-database temp…
Munesh
  • 1,509
  • 3
  • 20
  • 46
4
votes
2 answers

Difference between --append and --incremental append in sqoop

Is there any difference between using --append and --incremental append for inserting new rows from RDBMS to an existing dataset in HDFS? I am using --append along with --where and --incremental append along with --last-value.
Midhun Mathew Sunny
  • 1,271
  • 4
  • 17
  • 30
4
votes
0 answers

Reading BLOB data which is stored as Binary datatype in Hive

We have Oracle BLOB and VARBINARY (SQL Server/Progress) data in hive which is stored as String or Binary datatype. We have brought data from respective RDBMS using sqoop. Now that we have data in hdfs, we like to see the actual attachments like pdf…
Despicable me
  • 548
  • 1
  • 9
  • 24
4
votes
1 answer

OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml

I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties. My objective was however, to define some global properties in file referenced in job-xml tag in global…
ArB
  • 43
  • 1
  • 4
4
votes
0 answers

SQL Reserved Words when using sqoop export

One of my tables has a column named as a reserved word in SQL (Desc), and when running an export from Sqoop it fails with "Incorrect syntax near the keyword 'Desc'." when looking at logs. Is there a fix to this other than changing the column's name?
Copich
  • 51
  • 2