1

Wanted to know if there's support to enable aws glue catalog for Presto/Spark when running on EMR.Could not find anything in the documentation.

Atif
  • 129
  • 1
  • 14

2 Answers2

2

From the link provided by the answer above, i was able to model terraform code as follows-:

Create a configuration.json.tpl with the following content

[{
       "Classification": "spark-hive-site",
       "Properties": {
         "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
       }
     }
]

Create a template from the above template in your terraform code

data "template_file" "cluster_1_configuration" {
  template = "${file("${path.module}/templates/configuration.json.tpl")}"
}

And then setup the cluster as such-:

resource "aws_emr_cluster" "cluster_1" {
  name          = "${var.cluster_name}-1"
  release_label = "emr-5.21.0"
  applications  = ["Spark", "Zeppelin", "Hadoop","Sqoop"]
  log_uri       = "s3n://${var.cluster_name}/logs/"
  configurations = "${data.template_file.cluster_1_configuration.rendered}"
  ...
}

Glue should work now from Spark, you can verify this by calling spark.catalog.listDatabases().show() from spark-shell.

Atif
  • 129
  • 1
  • 14
0

The following AWS documents discuss about using Apache Spark and Hive on Amazon EMR with the AWS Glue Data Catalog, and also using AWS Glue Data Catalog as the default Hive metastore for Presto (Amazon EMR release version 5.10.0 and later). Hope you are looking for this?

https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-glue.html and

and

https://aws.amazon.com/about-aws/whats-new/2017/08/use-apache-spark-and-hive-on-amazon-emr-with-the-aws-glue-data-catalog/

Also please check this SO link for some glue catalog configurations on EMR:

Issue with AWS Glue Data Catalog as Metastore for Spark SQL on EMR

Yuva
  • 2,831
  • 7
  • 36
  • 60
  • 1
    Above question expects enabling AWS Glue while launching EMR via Terraform , above AWS links only support documentation about AWS Glue , EMR and its AWS Glue catalog support but not the use of Terraform while launching EMR with Glue. – milind bhavsar Apr 25 '19 at 05:13
  • Please read question properly before answering - @yuva – Abhinav Kumar May 03 '21 at 16:22