0

I am trying to upload 4K files (approx) of size 5G each on HDFS for processing. I am using the command way to do this:

Iterating over each file -

hadoop fs -copyFromLocal "LocalPath" "HDFSPath"

It is taking a lot of time. Is there a faster way to do this? Does block size matter here?

Thanks in advance.

  • Duplicate of http://stackoverflow.com/questions/19570660/hadoop-put-performance-large-file-20gb – Kumar Oct 17 '16 at 04:24
  • Instead of running the commands sequentially, run them parallelly. Run this as batch. Like at any given point of time, there should be n ( lets say 5) of them running together. – Preeti Khurana Oct 17 '16 at 16:48

1 Answers1

0

You can upload file in parallel by using background command by dividing 4k files into sets as per your cluster configuration.