2

I have one AWS glue pyspark script for example scriptA.py. In this script I have defined few generic functions like readSourceData()

def readSourceData(parameter1, parameter2):

//logic of function

Now I want to use this generic function in my secong glue pyspark script scriptB.py.

I have many such generic functions. How can I import these functions and use them in other scripts?

Deepak Gupta
  • 387
  • 2
  • 17
Beginner
  • 71
  • 1
  • 3
  • 10

1 Answers1

6

You can create modules with your generic functions and attach those external python modules to your glue jobs. More on this you can read up here.

Extensive answer:

  1. You bundle your generic functions in a python module.
  2. Then you .zip the module and upload it to S3.
  3. You add the path of your module in S3 to your Glue job in the Python library path field enter image description here

Make sure that your Job Role has access to the location in S3.

Robert Kossendey
  • 6,733
  • 2
  • 12
  • 42
  • Hi Robert, Thank you so much for the link. :) But, I could not get much info from link. If possible, could you please explain the solution in detail here? – Beginner Apr 26 '21 at 08:03
  • I added to my answer. Hope it is clear now :) if it helped you, an upvote and a accept would be appreciated – Robert Kossendey Apr 26 '21 at 08:49
  • Hi Robert, Thanks! I followed above steps but now how to call these functions in glue job? – Beginner Apr 26 '21 at 11:56
  • You just import them as you would in python. From module_name import function_name – Robert Kossendey Apr 26 '21 at 12:16
  • in my readSourceData function, I have one glueContext.create_dynamic_frame from Athena. When I am running the glue job, its showing an error that - In readSourceDatafunction function, 'glueContext' is not defined. – Beginner Apr 26 '21 at 13:41
  • Of course you would need to define the glue context in the function or inject it. – Robert Kossendey Apr 26 '21 at 13:44
  • I did pass the glueContext in the function, readSourceData(glueContext, tablename,..) but still its showing same error. – Beginner Apr 26 '21 at 13:57
  • This is a problem of your code though and not the topic of this questions. Please raise another question then. An upvote or an accept on this question would be appreciated. – Robert Kossendey Apr 26 '21 at 14:00
  • Accepting would be great as well :) – Robert Kossendey Apr 26 '21 at 14:14
  • What is the module name in this case? There are three components to an s3 url. Is the module name bucket.prefix.object ? – Jacob Bayer Sep 07 '22 at 19:21