0

I have sqoop stmt with 10 mappers. Entire data is going into 10 parts in hadoop with each part exceeding 1GB. I want to divide the data into multiple files of smaller parts, needless to say more than 10, something like 50 files of 200MB each. However due to DB bottleneck issue, I cant create more than 10 mappers in a sqoop. Let me know if there's any easy solution.

Community
  • 1
  • 1
Pavan Ebbadi
  • 852
  • 1
  • 13
  • 26
  • 1
    When you say "However due to DB bottleneck issue, I cant create more than 10 mappers in a sqoop, that means your job will fail if you set more than 10 mappers? – dbustosp Mar 09 '17 at 21:59

1 Answers1

0

There is a solution for this in direct mode.

You can use --direct-split-size (in Bytes)

Example: --direct-split-size 200000000 will generates files of approx. 200 MB.

Check here for more details.

Dev
  • 13,492
  • 19
  • 81
  • 174