10

I am trying to setup AWS Glue environment on my ubuntu Virtual box by following AWS documentation.

I have done the needful like downloading aws glue libs, spark package and setting up spark home as suggested. After that, i am not able to initialize glue context and facing below error.

from awsglue.context import GlueContext
from pyspark.context import SparkContext

glueContext = GlueContext(SparkContext.getOrCreate())
or 
glueContext = GlueContext(sc)

Error:

TypeError          Traceback (most recent call last)
<ipython-input-15-0798793d4033> in <module>
----> 1 glueContext = GlueContext(SparkContext.getOrCreate())

~/aws-glue-libs-glue-1.0/PyGlue.zip/awsglue/context.py in __init__(self, sparkContext, **options)
     43         super(GlueContext, self).__init__(sparkContext)
     44         register(sparkContext)
---> 45         self._glue_scala_context = self._get_glue_scala_context(**options)
     46         self.create_dynamic_frame = DynamicFrameReader(self)
     47         self.write_dynamic_frame = DynamicFrameWriter(self)

~/aws-glue-libs-glue-1.0/PyGlue.zip/awsglue/context.py in _get_glue_scala_context(self, **options)
     64 
     65         if min_partitions is None:
---> 66             return self._jvm.GlueContext(self._jsc.sc())
     67         else:
     68             return self._jvm.GlueContext(self._jsc.sc(), min_partitions, target_partitions)

TypeError: 'JavaPackage' object is not callable
James Z
  • 12,209
  • 10
  • 24
  • 44
rpshgupta
  • 135
  • 1
  • 8

2 Answers2

4

Copy aws-glue-libs jar files to Spark Jar folder. it means copy jar files from \aws-glue-libs\jarsv1\ folder to \spark-2.4.3-bin-spark-2.4.3-bin-hadoop2.8\jars folder

Billa
  • 59
  • 5
1

After implementing the instructions as per given in the URL(https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-libraries.html).

Check if spark.executor.extraClassPath and spark.driver.extraClassPath env variables are set to {user_path}\\aws-glue-libs-glue-{1.0/master}\\jarsv1\\*

To verify the classpaths execute the below code:

from pyspark.context import SparkContext

sc = SparkContext()
sc.getConf().getAll()

Given error is coming mainly due to the classpath issue that pointing to AWS related jar files.

marcin2x4
  • 1,321
  • 2
  • 18
  • 44