1

I'm trying to copy some files over to the tmp folder using boto3 in a glue job. Here's my code:

import pandas as pd
import numpy as np
import boto3

bucketname = "<bucket_name>"
s3 = boto3.resource('s3')
my_bucket = s3.Bucket(bucketname)
print('line 9')
source = "stuff/20210223/"
#target = temp directory job is running in
target = os.path.dirname(os.path.realpath(__file__))

for obj in my_bucket.objects.filter(Prefix=source):
    print('line 15')
    source_filename = (obj.key).split('/')[-1]
    copy_source = {
        'Bucket': bucketname,
        'Key': obj.key
    }
    print(obj.key)
    print('line 21')
    target_filename = "/{}/{}".format(target, source_filename)
    print('target_filename')
    print(target_filename)
    s3.meta.client.copy(copy_source, bucketname, target_filename)
    print('line 27')


print('curr dir')
curr_dir = os.path.dirname(os.path.realpath(__file__))
print('\n----------------\n')
dir_path = os.path.dirname(os.path.realpath(__file__))
files = [f for f in os.listdir('.') if os.path.isfile(f)]
print(files)

This yields the following error:

  File "/tmp/runscript.py", line 123, in <module>
    runpy.run_path(temp_file_path, run_name='__main__')
  File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/tmp/glue-python-scripts-hcnfmxbn/CDF parser.py", line 26, in <module>
  File "/usr/local/lib/python3.6/site-packages/boto3/s3/inject.py", line 379, in copy
    return future.result()
  File "/usr/local/lib/python3.6/site-packages/s3transfer/futures.py", line 106, in result
    return self._coordinator.result()
  File "/usr/local/lib/python3.6/site-packages/s3transfer/futures.py", line 265, in result
    raise self._exception
  File "/usr/local/lib/python3.6/site-packages/s3transfer/tasks.py", line 255, in _main
    self._submit(transfer_future=transfer_future, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/s3transfer/copies.py", line 110, in _submit
    **head_object_request)
  File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.6/site-packages/botocore/client.py", line 661, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tmp/runscript.py", line 142, in <module>
    raise e_type(e_value).with_traceback(new_stack)
TypeError: __init__() missing 1 required positional argument: 'operation_name'

Have followed just about every SO thread about this error message but solutions seem to have to do with region, and I already checked that the region of my glue job is the same as my bucket. The error occurs in this line of code: s3.meta.client.copy(copy_source, bucketname, target_filename)
When I actually try to copy the file over to the tmp folder. I don't understand how this can be a permissions issue associated with my write access to the folder with my Glue service IAM role, because I can save csv's to the folder using pandas to_csv

Ravmcgav
  • 183
  • 1
  • 1
  • 11

1 Answers1

0

Solved by updating the IAM Role policy I was using for the glue job to allow write access to the bucket the job is running in.

Ravmcgav
  • 183
  • 1
  • 1
  • 11