0

We are performing client-side encryption on certain text content and store them into individual files in s3. We are looking to read these files and process the content in AWS Glue. We are able to read the contents but during decryption, we get a picking error.

import sys
import json
import boto3
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import base64

session = boto3.session.Session()
kms = session.client('kms')

    
def s3Decrypt(decoded):
    decoded = base64.b64decode(decoded)
    meta = kms.decrypt(CiphertextBlob=decoded)
    plaintext = meta[u'Plaintext']
    return plaintext.decode()

def map_function(v):
    value = v[1]
    return s3Decrypt(value)
    
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)


spark_rdd = sc.wholeTextFiles("s3://xxxxx/yyyy/zzzzz/*").map(lambda x : decrypt(x))

print(spark_rdd.collect())

job.commit()

This is the error we get :

TypeError: can't pickle SSLContext objects
Traceback (most recent call last):
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 590, in dumps return cloudpickle.dumps(obj, 2)

Is there any way to accomplish this?

justlikethat
  • 329
  • 2
  • 12

3 Answers3

0

To reproduce it is very complicated, the lambda function might be the cause of your problem.

As you can see here: https://docs.python.org/3.5/library/pickle.html#what-can-be-pickled-and-unpickled, describe functions defined at the top level of a module (using def, not lambda).

fernolimits
  • 426
  • 3
  • 8
0

Add the session and kms client to the s3Decrypt function definition.

johnhill2424
  • 108
  • 1
  • 4
0

The problem is you need to call the kms client from within the function