2

I am using cloudera quick start edition CDH 5.7

I used below query on terminal window:

sqoop import \
  --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
  --username=retail_dba \
  --password=cloudera \
  --query="select * from orders join order_items on orders.order_id = order_items.order_item_order_id where \$CONDITIONS" \
  --target-dir /user/cloudera/order_join \
  --split-by order_id \
  --num-mappers 4

Q: What is the purpose of the $CONDITIONS ? Why used in this query ? Can anybody can explain to me.

1 Answers1

3

$CONDITIONS is used internally by sqoop to modify query to achieve task splitting and fetching metadata.

To fetch metadata, sqoop replaces \$CONDITIONS with 1= 0

select * from table where 1 = 0

To fetch all data (1 mapper), sqoop replaces \$CONDITIONS with 1= 1

select * from table where 1 = 1

In the case of multiple mappers, sqoop replaces \$CONDITIONS with range query to fetch a subset of data from RDBMS.

For example, id lies between 1 to 100 and we are using 4 mappers.

Select * From table WHERE id >= 1' AND 'id < 25
Select * From table WHERE id >= 25' AND 'id < 50
Select * From table WHERE id >= 50' AND 'id < 75
Select * From table WHERE id >= 75' AND 'id <= 100
Dev
  • 13,492
  • 19
  • 81
  • 174