We are performing client-side encryption on certain text content and store them into individual files in s3. We are looking to read these files and process the content in AWS Glue. We are able to read the contents but during decryption, we get a picking error.
import sys
import json
import boto3
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
import base64
session = boto3.session.Session()
kms = session.client('kms')
def s3Decrypt(decoded):
decoded = base64.b64decode(decoded)
meta = kms.decrypt(CiphertextBlob=decoded)
plaintext = meta[u'Plaintext']
return plaintext.decode()
def map_function(v):
value = v[1]
return s3Decrypt(value)
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
spark_rdd = sc.wholeTextFiles("s3://xxxxx/yyyy/zzzzz/*").map(lambda x : decrypt(x))
print(spark_rdd.collect())
job.commit()
This is the error we get :
TypeError: can't pickle SSLContext objects
Traceback (most recent call last):
File "/opt/amazon/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 590, in dumps return cloudpickle.dumps(obj, 2)
Is there any way to accomplish this?