-1

I have a requirement to move text files in hdfs to aws s3. The files in HDFS are text files and non-partitioned.The output of the S3 files after migration should be in orc and partitioned on specific column. Finally a hive table is created on top of this data.

One way to achieve this is using spark. But I would like to know, is this possible using Distcp to copy files as ORC.

Would like to know any other best option is available to accomplish the above task.

Thanks in Advance.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
nagendra
  • 1,885
  • 3
  • 17
  • 27

1 Answers1

1

DistCp is just a copy command; it doesn't do conversion of anything. You are trying to execute a query generating some ORC formatted output. You will have to use a tool like Hive, Spark or Hadoop MapReduce to do it.

stevel
  • 12,567
  • 1
  • 39
  • 50