Hi is there any ways we could move data from hive views to S3? For tables I am using distcp but since views doesnt have data residing in HDFS location I wasn't able to do distcp and I don't have access for tables used in creating views. If I do CTAS from view I will not be getting the latest data from view whenever it was refreshed. Is there any other ways apart from creating table from view and doing distcp? Please help thanks in advance.
Asked
Active
Viewed 680 times
1
-
Do you want to materialize View data in S3 when the view was refreshed? – leftjoin Sep 21 '19 at 13:18
-
No it's just a view. Created using create view viewname as select * from tablename. And there will be no physical location. – Rajkumar Sep 22 '19 at 04:12
-
1The best option would be to write a spark program which will load the data from your view/table using hive context and write back to S3 in required format like parquet/orc/csv/json. – Harsh Bafna Sep 26 '19 at 08:14
2 Answers
2
The best option would be to write a spark program which will load the data from your view/table using hive context and write back to S3 in required format like parquet/orc/csv/json

Harsh Bafna
- 2,094
- 1
- 11
- 21
0
One of the approach that you can use is to copy Hive standard output Stream Data to S3 using AWS S3 command line interface (CLI).
Here is the sample code.
beeline -u jdbc:hive2://192.168.0.41:10000/test_db -n user1 -p password --outputformat=csv2 -e "select * from t1_view" | aws s3 cp - s3://testbucket/upload/test_view.txt
Note, you have to install AWS S3 CLI to use this method.

Vithal
- 101
- 3