spark.sql is not working when code running in aws kubernetes pod

Question

My project code is running in K8 pod and all we did is to upload some data into S3 bucket and create some glue tables in hive to point to those data in s3 bucket. We have spark to run the S3 operation in scala and spark suppose to run spark.sql to create those tables. We are able to upload data into S3 bucket, but spark.sql is not working and we got the exception like this.

INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1 using Spark classes.
INFO BlockManagerMasterEndpoint: Registering block manager XXXXX
WARN HiveConf: HiveConf of name hive.server2.thrift.url does not exist
WARN HiveConf: HiveConf of name hive.metastore.glue.catalogid does not exist
WARN EC2MetadataUtils: Unable to retrieve the requested metadata.
INFO AWSGlueClientFactory: No region info found, using SDK default region: us-east-1
WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: com.amazonaws.SdkClientException: Unable to load AWS credentials from any provider in the chain   
at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabase

code looks like this

val spark = SparkSession.builder()
   .config(appName)
   .config(someConfigs)
   .enbaleHiveSupport()
   .getOrCreate()

spark.sql(s"sql statement")

The issue is that the region us-east-1 is wrong. The correct region is us-west-2 and our S3 bucket operation is able to upload the files successfully without any issue( we do not initialize S3 client in the code as those configuration work is already done by SRE)which I suppose that our spark could pick up the right region. But when we run the spark.sql code, it suddenly not pick up the region info and look like we did not define the correct region into hive configuration.

What makes me confused is that we do not need to specify those AWS credentials because we are in aws k8 pod which automatically has that kind of information. Is there any configuration work missing for Hive?

I tried to add aws region info to spark when initializing spark. But it failed S3 bucket operations and also the spark.sql did not work neither.

`Unable to load AWS credentials from any provider in the chain` -> aws permission problem to access the metastore, glue catalog. — Lamanus, Mar 11 '23 at 10:31

spark.sql is not working when code running in aws kubernetes pod

0 Answers0