Moving data from hive views to aws s3

Question

Hi is there any ways we could move data from hive views to S3? For tables I am using distcp but since views doesnt have data residing in HDFS location I wasn't able to do distcp and I don't have access for tables used in creating views. If I do CTAS from view I will not be getting the latest data from view whenever it was refreshed. Is there any other ways apart from creating table from view and doing distcp? Please help thanks in advance.

Do you want to materialize View data in S3 when the view was refreshed? — leftjoin, Sep 21 '19 at 13:18
No it's just a view. Created using create view viewname as select * from tablename. And there will be no physical location. — Rajkumar, Sep 22 '19 at 04:12
The best option would be to write a spark program which will load the data from your view/table using hive context and write back to S3 in required format like parquet/orc/csv/json. — Harsh Bafna, Sep 26 '19 at 08:14

score 2 · Answer 1 · answered Sep 26 '19 at 16:23

2

The best option would be to write a spark program which will load the data from your view/table using hive context and write back to S3 in required format like parquet/orc/csv/json

answered Sep 26 '19 at 16:23

Harsh Bafna

2,094
1
11
21

@SreeramNair Csv as output? – Harsh Bafna Sep 30 '19 at 01:55

score 0 · Answer 2 · answered Sep 27 '19 at 06:28

One of the approach that you can use is to copy Hive standard output Stream Data to S3 using AWS S3 command line interface (CLI).

Here is the sample code.

beeline -u jdbc:hive2://192.168.0.41:10000/test_db -n user1 -p password --outputformat=csv2 -e "select * from t1_view" | aws s3 cp - s3://testbucket/upload/test_view.txt

Note, you have to install AWS S3 CLI to use this method.

Moving data from hive views to aws s3

2 Answers2