I can read a table, defined in the glue data catalogue from a glue job with the glueContext. However, if I want to read the exact same table with hiveContext, I receive an error message stating that it cannot find that table.
In my opinion the HiveContext cannot access the glue data catalog.
Do you know what to insert in the glue job configuration (edit job -> job parameters -> "--conf xyz") to make sure that the HiveContext can find and access tables in the glue data catalogue?
I'd like to execute the following code:
# import libs
from pyspark.context import SparkContext
from pyspark.sql.functions import *
from pyspark.sql.types import *
from pyspark.sql import HiveContext
# create sparkContext and HiveContext
sc = SparkContext()
hc = HiveContext(sc)
# read table from glue data catalogue
df = hc.table('glue_db.glue_table').persist()
The code above returns the following error message:
pyspark.sql.utils.AnalysisException: u"Table or view not found:
glue_db
.glue_table
;;\n'UnresolvedRelationglue_db
.glue_table
\n"
I have tried the spark versions spark2.2 and spark2.4
Many thanks in advance!