1

I'm using Spark 2.2 with ephemeral clusters on EMR. I'd like to use spark bucketing and I don't care about Hive (Spark only workloads).

Can I use spark.sql.warehouse.dir with a s3 bucket to save metastore information in order to make them not cluster dependant ?

Do I also need a location for storing metastore_db ?

What happens behind the scene ? Where are stored information displayed by this command : spark.catalog.listTables.show ?

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Yann Moisan
  • 8,161
  • 8
  • 47
  • 91

0 Answers0