1

I guys,

I'm trying to extract only data with a filter using Sqoop. For that I'm using a Bondary query. I only want to filter the departments_id between 3 and 6. I've the following command:

[cloudera@quickstart ~]$ sqoop import --connect jdbc:mysql://localhost:3306/retail_db --username retail_dba --password cloudera --table departments --target-dir=wareouse/departments_v1 --boundary-query "SELECT department_id, department_name FROM departments WHERE department_id BETWEEN 3 AND 6"

But I am getting the following error:

18/12/05 12:48:27 ERROR tool.ImportTool: Import failed: java.io.IOException: java.sql.SQLException: Invalid value for getLong() - 'Fitness'

Do you know what I'm making wrong on my command?

The source data is like:

  department_id | department_name |
+---------------+-----------------+
|             2 | Fitness         |
|             3 | Footwear        |
|             4 | Apparel         |
|             5 | Golf            |
|             6 | Outdoors        |
|             7 | Fan Shop        |
+---------------+-----------

Thanks!

Pedro Alves
  • 1,004
  • 1
  • 21
  • 47

1 Answers1

0

The boundary query needs slight modification. By default, Sqoop will use the below query to find the boundaries for creating splits:

SELECT MIN(department_id), MAX(department_id) FROM departments

To import a subset of the data, you can use this boundary query to provide the lower and upper bounds:

SELECT 3,6 FROM departments

The below illustration gives more details:

1) Create table and populate data

mysql> create database retail_db;
mysql> use retail_db;
mysql> create table departments (department_id int primary key, department_name varchar(255));
mysql> insert into departments values(2, 'Fitness');
mysql> insert into departments values(3, 'Footwear');
mysql> insert into departments values(4, 'Apparel');
mysql> insert into departments values(5, 'Golf');
mysql> insert into departments values(6, 'Outdoors');
mysql> insert into departments values(7, 'Fan Shop');

2) Check data

mysql> select * from departments;
+---------------+-----------------+
| department_id | department_name |
+---------------+-----------------+
|             2 | Fitness         |
|             3 | Footwear        |
|             4 | Apparel         |
|             5 | Golf            |
|             6 | Outdoors        |
|             7 | Fan Shop        |
+---------------+-----------------+
6 rows in set (0.00 sec)

3) Run Sqoop job

$ sqoop import --connect jdbc:mysql://localhost:3306/retail_db --username user --password password --table departments --target-dir /test/run --boundary-query 'SELECT 3,6 FROM departments'

4) Check result

$ hadoop fs -cat /test/run/part-*
3,Footwear
4,Apparel
5,Golf
6,Outdoors
Jagrut Sharma
  • 4,574
  • 3
  • 14
  • 19