I'm trying to connect to AWS S3 using Spark Thrift Service. I'm using:
spark-defaults.conf
spark.sql.warehouse.dir s3://demo-metastore-001/
spark.hadoop.fs.s3.impl org.apache.hadoop.fs.s3a.S3AFileSystem
spark.hadoop.fs.s3.aws.credentials.provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
spark.hadoop.fs.s3a.aws.credentials.provider org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
spark.hadoop.fs.s3.access.key XXXXXXXXXXXXX
spark.hadoop.fs.s3.secret.key yyyyyyyyyyyyyyyyyyyy
spark.hadoop.fs.s3a.access.key XXXXXXXXXXXXX
spark.hadoop.fs.s3a.secret.key yyyyyyyyyyyyyyyyyyyy
hive-site.xml
<property>
<name>hive.metastore.warehouse.dir</name>
<value>s3://demo-metastore-001/</value>
</property>
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>XXXXXXXXXXXXX</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>yyyyyyyyyyyyyyyyyyyy</value>
</property>
<property>
<name>fs.s3a.awsAccessKeyId</name>
<value>XXXXXXXXXXXXX</value>
</property>
<property>
<name>fs.s3a.awsSecretAccessKey</name>
<value>yyyyyyyyyyyyyyyyyyyy</value>
</property>
As you see I'm using brute force mixing s3 and s3a, not sure what are the right parameters
I'm running:
start-thriftserver.sh --packages org.apache.spark:spark-hadoop-cloud_2.12:3.3.1 --conf spark.hadoop.fs.s3.impl=org.apache.hadoop.fs.s3a.S3AFileSystem --master local[*]
The log doesn't show any error:
23/01/12 04:16:12 INFO MetricsSystemImpl: s3a-file-system metrics system started
23/01/12 04:16:13 INFO SharedState: Warehouse path is 's3a://demo-metastore-001/'.
23/01/12 04:16:14 INFO HiveUtils: Initializing HiveMetastoreConnection version 2.3.9 using Spark classes.
23/01/12 04:16:14 INFO HiveClientImpl: Warehouse location for Hive client (version 2.3.9) is s3a://demo-metastore-001/
23/01/12 04:16:18 INFO HiveUtils: Initializing execution hive, version 2.3.9
23/01/12 04:16:18 INFO HiveClientImpl: Warehouse location for Hive client (version 2.3.9) is s3a://demo-metastore-001/
But the metastore it is always created in the Master server directory not the in S3. Any idea how can I connect Spark-Thrift Server to AWS s3?