0

My Sqoop import is working only with 1 map task ( - m 1) not more.

This is working:

sqoop import --connect jdbc:mysql://localhost/databaseY --username root --password PASSWORD --table tableX --target-dir /tmp/databaseY --as-textfile -m 1

This not:

sqoop import --connect jdbc:mysql://localhost/databaseY --username root --password PASSWORD --table tableX --target-dir /tmp/databaseY --as-textfile -m 3

My cluster is a 3 nodes on AWS.

I missed something during the configuration?

----EDIT FOR THE SOLUTION ---- The problem was the localhost. I changed it by the IP address and it is working fine.

Selverine
  • 57
  • 2
  • 8

1 Answers1

1

As sqoop docs are sufficient to put some light on this,

When performing parallel imports, Sqoop needs a criterion by which it can split the workload. Sqoop uses a splitting column to split the workload. By default, Sqoop will identify the primary key column (if present) in a table and use it as the splitting column. The low and high values for the splitting column are retrieved from the database, and the map tasks operate on evenly-sized components of the total range. For example, if you had a table with a primary key column of id whose minimum value was 0 and maximum value was 1000, and Sqoop was directed to use 4 tasks, Sqoop would run four processes which each execute SQL statements of the form SELECT * FROM sometable WHERE id >= lo AND id < hi, with (lo, hi) set to (0, 250), (250, 500), (500, 750), and (750, 1001) in the different tasks.


If a table does not have a primary key defined and the --split-by <col>is not provided, then import will fail unless the number of mappers is explicitly set to one with the --num-mappers 1 option.

(Emphasis is mine)

Edit: My previous answer on a related topic will also help you on this.

Community
  • 1
  • 1
Dev
  • 13,492
  • 19
  • 81
  • 174