Questions tagged [sqoop]

Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the imports and exports are performed in parallel.

Sqoop is an open source connectivity framework that facilitates transfer between multiple Relational Database Management Systems (RDBMS) and HDFS. Sqoop uses MapReduce programs to import and export data; the imports and exports are performed in parallel.

You can use Sqoop to import data from a relational database management system (RDBMS) such as MySQL or Oracle into the Hadoop Distributed File System (HDFS), transform the data in Hadoop MapReduce, and then export the data back into an RDBMS.

Available Sqoop commands:

  codegen            Generate code to interact with database records
  create-hive-table  Import a table definition into Hive
  eval               Evaluate a SQL statement and display the results
  export             Export an HDFS directory to a database table
  help               List available commands
  import             Import a table from a database to HDFS
  import-all-tables  Import tables from a database to HDFS
  import-mainframe   Import mainframe datasets to HDFS
  list-databases     List available databases on a server
  list-tables        List available tables in a database
  version            Display version information

Sqoop has been a Top-Level Apache project since March of 2012.

References

Related Tags

2610 questions
0
votes
1 answer

Sqoop append with renaming (-Dmapreduce.output.basename) not creating files in HDFS

I'm trying to import some records from source using sqoop. I want to add all the output (from multiple runs) to the same folder & also have custom basename for each run (using -Dmapreduce.output.basename). Files gets successfully created in the…
srikanth ramesh
  • 131
  • 1
  • 11
0
votes
1 answer

Sqoop import is failed due to java.io.IOException: SQLException in nextKeyValue

I am joining three tables data and importing the data from Oracle to Hive using Sqoop import command. Find table data count below. select count(*) from table1; -- 40446561 select count(*) from table2; -- 16886690 select count(*) from table3; --…
Ranga Reddy
  • 2,936
  • 4
  • 29
  • 41
0
votes
1 answer

Sqoop: Exception ERROR tool.ImportTool: Import failed: java.net.UnknownHostException: host: host: unknown error while importing

I am trying to import data from MYSQL to Hadoop but I am getting below exception. Can someone please help me. please find the stack trace below: Command: sqoop import --connect jdbc:mysql://localhost/sqoopdb --username 'root' -P --table 'company'…
Saurabh P
  • 3
  • 4
0
votes
1 answer

How to save data in multiple environments using sqoop command

I need to save the data in both HDFS and AWS S3 at a time. i have used below command but only first given path is working. sqoop import -D mapreduce.job.name=XXX-D mapred.job.queue.name=XX -Dhadoop.security.credential.provider.path=
LUZO
  • 1,019
  • 4
  • 19
  • 42
0
votes
1 answer

TCP/IP connection to the port is failed. sqoop jdbc connection error

When i try to run the sqoop command to import sqoop import --connect jdbc:sqlserver://localhost/db_name--username user--password user--table table_name. I get the following error. ERROR: The TCP/IP connection to the host 127.0.0.1, port 1433 has…
sruthi
  • 13
  • 1
  • 4
0
votes
1 answer

warehouse dir argument and re-attempt of map reduce task

I am using warehouse-dir argument for a reason and not using target-dir in my sqoop job. If Map-reduce task is re-attempted, it throws error given below. How do I fix this? Since it is only a re-attempt, it makes no difference if I delete directoy…
systemdebt
  • 4,589
  • 10
  • 55
  • 116
0
votes
1 answer

sqoop incremantal count the difference

I want to import all new lines of my table using sqoop to hive table, the problem that i dont have column to use for my incremantal update. So i try to count all lignes of my table and i store it into hive with a timestamp column. Than i select the…
Zied Hermi
  • 229
  • 1
  • 2
  • 11
0
votes
0 answers

Issue with timestamp field while using SQOOP

An extra space is added before the milliseconds while timestamp field is being ingested for eg. 05-OCT-17 03.39.02.689000000 AM is ingested as 2017-10-5 3:39:2. 689000000. Using Oracle as the source,parquet format as the format for storing the data…
Sumit Khurana
  • 159
  • 1
  • 10
0
votes
0 answers

Getting error while running Sqoop in EC2 instance

I installed Sqoop in my EC2 instance with the reference of http://kontext.tech/docs/DataAndBusinessIntelligence/p/configure-sqoop-in-a-edge-node-of-hadoop-cluster My hadoop cluster is also working well and good. I got the error Error: Could not find…
0
votes
1 answer

SQL Server to S3 via Sqoop

I've seen some conflicting information on whether or not Sqoop is able to handle the following use case: I'm pretty sure I've seen it done in the past, but want to double-check.. Here's the bash script: sqoop-import…
DataDog
  • 475
  • 1
  • 9
  • 23
0
votes
1 answer

How to handle XML file in hive

How to handle this XML file in hive, I want only USERNAME and PASSWORD in output
Kishore
  • 11
  • 4
0
votes
1 answer

Error while importing tables from Mysql using Sqoop

I am trying to import table from Mysql database using sqoop. Mysql is installed in the same box where sqoop, hadoop and hive installed and i can access the database from terminal. while trying to import getting below error. Please help to resolve…
S Das
  • 11
  • 1
0
votes
1 answer

how to pass string into split by condition in sqoop

I have a sqoop query like this. sqoop import -Ddb2.jcc.sslConnection=true --connect jdbc:db2://192.1.1.2:6060/DB2M --username ${username} --password $password --query " SELECT ACCOUNT_DATE,DIV_VALUE,from ${qualifier}.DTL where year = '${year}' AND…
Teju Priya
  • 595
  • 3
  • 8
  • 18
0
votes
0 answers

Exporting data using sqoop

I have a table "base" imported from RDMS having columns: station_num int(11), zipcode varchar(255), city varchar(255), state varchar(255) Now I want to export just two columns from table "base" i.e. state and station_num in the same order from this…
0
votes
1 answer

Fetch data from oracle and process using spark in emr cluster

I have an oracle table having around 30 tables. I want to dump the data from these tables for a specific time period into EMR cluster and run hive query that I have on the data. I would like to use spark and AWS EMR for performing this. This will be…
Punter Vicky
  • 15,954
  • 56
  • 188
  • 315