0

I have setup Glue Interactive sessions locally by following https://docs.aws.amazon.com/glue/latest/dg/interactive-sessions.html However, I am not able to add any additional packages like HUDI to the interactive session

There are a few magic commands to use but not sure which one is apt and how to use

%additional_python_modules
%extra_jars
%extra_py_files
NarenS
  • 1

3 Answers3

0

I am not able to comment on the question, so adding link to a similar question that has received an answer.

Regarding the magic commands, you will find the descriptions once you start the glue interactive notebook. I am also adding them here.

  • %additional_python_modules List Comma separated list of pip packages, s3 paths or private pip arguments.
    • %additional_python_modules ['path_to_pip_package_1', 'path_to_pip_package_2']
  • %extra_jars List Comma separated list of additional Jars to include in the cluster.
  • %extra_py_files List Comma separated list of additional Python files from S3.
0

In my case, I have a few Python helper functions in *.py and *.zip files (also contains some *.py files but just zipped). This works:

%extra_py_files 's3://bucket/a.py,s3://bucket/b.py,s3://bucket/c.zip'

%additional_python_modules doesn't work for me so I assumed this magic is for whl files only ¯\_(ツ)_/¯

Zach
  • 862
  • 11
  • 10
0

You can try using this magic in notebook, as for me, for local docker glue notebook env. it worked as intended, initialized a session with spark params passed, more examples can be found here

Example of adding hudi 0.11.0 version for spark 3.1.1

%%configure -f
{
    "conf": {
        "spark.jars.packages": "org.apache.hudi:hudi-spark3.1-bundle_2.12:0.11.0,org.apache.spark:spark-avro_2.12:3.1.1",
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer"
    }
}

Spark docs on available configs (just choose proper spark version)