How to use table bucketing with ephemeral EMR clusters?

Asked Oct 16 '17 at 22:02

Active Apr 18 '18 at 19:35

Viewed 156 times

I'm using Spark 2.2 with ephemeral clusters on EMR. I'd like to use spark bucketing and I don't care about Hive (Spark only workloads).

Can I use spark.sql.warehouse.dir with a s3 bucket to save metastore information in order to make them not cluster dependant ?

Do I also need a location for storing metastore_db ?

What happens behind the scene ? Where are stored information displayed by this command : spark.catalog.listTables.show ?

edited Apr 18 '18 at 19:35

Jacek Laskowski

asked Oct 16 '17 at 22:02

Yann Moisan

0 Answers0