I have a requirement to move text files in hdfs to aws s3. The files in HDFS are text files and non-partitioned.The output of the S3 files after migration should be in orc and partitioned on specific column. Finally a hive table is created on top of this data.
One way to achieve this is using spark. But I would like to know, is this possible using Distcp to copy files as ORC.
Would like to know any other best option is available to accomplish the above task.
Thanks in Advance.