3

What is the significance of $conditions clause in sqoop import command?

select col1, col2 from test_table where \$CONDITIONS
pratiksadaphal
  • 129
  • 1
  • 2
  • 8
  • refer:https://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html#_selecting_the_data_to_import – skr Jul 19 '17 at 09:39
  • 1
    already answered: https://stackoverflow.com/a/42331952/3929393 Let me know if you have any follow up question – Dev Jul 19 '17 at 14:06

1 Answers1

7

Sqoop performs highly efficient data transfers by inheriting Hadoop’s parallelism.

  • To help Sqoop split your query into multiple chunks that can be transferred in parallel, you need to include the $CONDITIONS placeholder in the where clause of your query.

  • Sqoop will automatically substitute this placeholder with the generated conditions specifying which slice of data should be transferred by each individual task.

  • While you could skip $CONDITIONS by forcing Sqoop to run only one job using the --num-mappers 1 param‐ eter, such a limitation would have a severe performance impact.

For example:-

If you run a parallel import, the map tasks will execute your query with different values substituted in for $CONDITIONS. one mapper may execute "select bla from foo WHERE (id >=0 AND id < 10000)", and the next mapper may execute "select bla from foo WHERE (id >= 10000 AND id < 20000)" and so on.

Taha Naqvi
  • 1,756
  • 14
  • 24