0

I've been working on this for a while, I'm still new to unit tests, so there's a good chance I'm missing something fundamental in my code.

The problem is that in my unit test, when I call my function which uses AWS glue write_dynamic_frame.from_options , I get this error:

py4j.protocol.Py4JJavaError: An error occurred while calling o62.pyWriteDynamicFrame. E : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 1 times, most recent failure: Lost task 0.0 in stage 4.0 (TID 12) (c95fce22e8c7 executor driver): java.nio.file.AccessDeniedException: s3://testbucketdestination/export-area/somesubfolder/year=2023/month=4/day=2/run-1678464947022-part-r-00000: getFileStatus on s3://testbucketdestination/export-area/somesubfolder/year=2023/month=4/day=2/run-1678464947022-part-r-00000: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: xxxxxxxx; S3 Extended Request ID: wCCNz4PZD/YYj8I6kqDYU6Eb+Wb/mtYxKj+bUwhXJ1cL0ZUsnXdh.................=; Proxy: null), S3 Extended Request ID: wCCNz4PZD/YYj8I6kqDYU6Eb+Wb/mtYxKj+bUwhXJ1cL0ZUsnXdh...............=:403 Forbidden

The process I'm using to run my unit tests is based on this https://noise.getoto.net/2022/04/14/develop-and-test-aws-glue-version-3-0-jobs-locally-using-a-docker-container/

These are the steps:

  1. Pull the aws-glue-libs docker image
docker pull amazon/aws-glue-libs:glue_libs_3.0.0_image_01
  1. set env vars
WORKSPACE_LOCATION=/home/myusername/gitkraken/example_root_dir/gluejob
SCRIPT_FILE_NAME=cfp_staging.py
UNIT_TEST_FILE_NAME=test_cfp_staging.py
PROFILE_NAME=pytestdocker
  1. Run a container in interactive mode (based on aws-glue-libs image) with a couple of commands to install moto and run pytest against the test file
docker run -it -v ~/.aws:/home/glue_user/.aws -v $WORKSPACE_LOCATION:/home/glue_user/workspace/gluejob -e AWS_PROFILE=$PROFILE_NAME -e DISABLE_SSL=true --rm -p 4040:4040 -p 18080:18080 --name glue_pytest amazon/aws-glue-libs:glue_libs_3.0.0_image_01 -c "pip install moto && python3 -m pytest --capture=no gluejob/tests/unit/test_cfp_staging.py"

This is an excerpt from my glue script, containing the function I want to test.

# ...
# ...imports, setup, logging, various other functions
# ...

def write_data(dynamicframe,gluecont,output_bucket):

    WriteData = gluecont.write_dynamic_frame.from_options(
        frame=dynamicframe,
        connection_type="s3",
        format="csv",
        connection_options={
            "path": "s3://"+ output_bucket + "/export-area/somesubfolder/",
            "partitionKeys": ["year", "month", "day"]
        }
        
    )

This is conftest.py with required fixtures


import boto3
import moto
import pytest
import os
from awsglue.context import GlueContext
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
from awsglue.job import Job
from awsglue.utils import getResolvedOptions
import sys
"""
Contains pytest setup and configuration.

Any fixtures that are shared between tests should be added here.
"""


@pytest.fixture(scope='function')
def aws_credentials():
    """Mocked AWS Credentials for moto."""
    os.environ['AWS_ACCESS_KEY_ID'] = 'testing'
    os.environ['AWS_SECRET_ACCESS_KEY'] = 'testing'
    os.environ['AWS_SECURITY_TOKEN'] = 'testing'
    os.environ['AWS_SESSION_TOKEN'] = 'testing'
    os.environ['AWS_DEFAULT_REGION'] = 'us-east-1'

@pytest.fixture
def s3_boto(aws_credentials):
    """Create an S3 boto3 client and return the client object"""
    
    with moto.mock_s3():
        yield boto3.client(
            "s3"
        )

@pytest.fixture
def s3_boto_resource(aws_credentials):
    """Create an S3 boto3 resource (the high level class)"""
    with moto.mock_s3():
        yield boto3.resource('s3')


@pytest.fixture(scope="module", autouse=True)
def glue_context():
    context = GlueContext(SparkContext.getOrCreate())
    job = Job(context)
    yield(context)
    job.commit()


@pytest.fixture(scope="session")
def sqlContext():
    """
    Function to setup test environment for PySpark and Glue
    """
    spark_context = SparkContext.getOrCreate()
    sqlContext = SQLContext(spark_context)
    yield sqlContext
    

this is my test script test_cfp_staging.py

import pytest
import boto3
from moto import mock_s3
from gluejob.src.cfp_staging import write_data
from awsglue.dynamicframe import DynamicFrame


@mock_s3
def test_getdata(): #Just here to ensure tests are being collected correctly by pytest
   assert 1 == 1

@mock_s3
def test_write_data(sqlContext,glue_context,s3_boto,s3_boto_resource):
   """Test the write_data function asserting that the df is written to s3"""
   output_bucket = "testbucketdestination"

   #Create some test data
   body_dict = [{'token_account': '12345600000012345678', 'batch_date': '2023-04-02', 'year': 2023, 'month': 4, 'day': 2}]

   #Create a bucket into which the write_data function will write the file
   s3_boto_resource.create_bucket(Bucket=output_bucket)

   #Create a dataframe from the body_dict data
   dataframe_body_dict = sqlContext.createDataFrame(body_dict)
   # dataframe_body_dict.printSchema()
   # dataframe_body_dict.show(truncate=False)

   #Convert the dataframe to a dynamic frame
   dynamicframe_body_dictfromDF = DynamicFrame.fromDF(dataframe_body_dict, glue_context, "dynamicframe_body_dictfromDF") #(dataframe, glue_ctx, name)
   dynamicframe_body_dictfromDF.show()
   dynamicframe_body_dictfromDF = dynamicframe_body_dictfromDF.repartition(1)

   #check that the bucket was created
   print("getting all buckets")
   response = s3_boto.list_buckets()
   print(response)

   #call the write_data function
   write_data(dynamicframe_body_dictfromDF,glue_context,output_bucket)

   # write an assertion - TBC
   assert 1==1 #just a placeholder for now
   

What I tried:

I can see that the call to write_data is being passed the expected arguments, and attempts to write to the mocked bucket.

I have tried passing arguments to the container run command, then reading them in my conftest with poytest_addoption in case the mocked credentials weren't making their way through.

I also tried passing those command line args to the test file itself.

The result was the same if I passed the same command line arguments as the credentials I mocked, so I ruled out that as an issue.

What I expect to happen When the write_data function is called, I expect the call to succeed, and a file to be written to the moto mocked bucket.

Where I suspect the problem is

I don't think that the problem is a lack of permissions as I have read that by default pytest should not have permission issues with moto mocked resources.

I suspect it's something to do with functions in glue_context not picking up the mock_s3 decorator in it's call to write to s3.

I would appreciate any advice or guidance at all.

Thanks very much

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
  • The mock is only active for Python code. From the stack trace, it looks like the Docker container executes Java code in a separate process. This Java code wouldn't know anything about what happens in Python – Bert Blommers Mar 11 '23 at 13:14

0 Answers0