I have a copy statement from a parquet file such as:
COPY schema.table
FROM 's3://bucket/folder/'
IAM_ROLE 'MyRole'
FORMAT AS PARQUET ;
The MyRole
policy is:
resource "aws_iam_policy" "PolicyMyRole" {
name = "MyRole"
path = "/"
policy = <<EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::other/*",
"arn:aws:s3:::folder"
]
},
{
"Effect": "Allow",
"Action": [
"s3:*"
],
"Resource": [
"arn:aws:s3:::folder/*",
"arn:aws:s3:::folder"
]
},
{
"Effect": "Allow",
"Action": [
"kinesis:*"
],
"Resource": "*"
}
]
}
EOF
}
The copy returns an error such as:
sqlalchemy.exc.InternalError: (psycopg2.InternalError) Spectrum Scan Error
DETAIL:
-----------------------------------------------
error: Spectrum Scan Error
code: 15001
context:
Error: HTTP response error code: 403 Message: AccessDenied Access Denied
x-amz-request-id: 9A5F3F8BB1C6AD5C
x-amz-id-2: 1JwcGdQFUJMec7s97plTFEvaw0EldAsDnYrg56bTpz/QVzbclIiVf/bK4ynGF/T7VNJIcf01PbQ=
query: 20027980
location: dory_util.cpp:929
process: fetchtask_thread [pid=527]
-----------------------------------------------
The parquet file is created using pandas Dataframe:
df.to_parquet(fname=path,compression="gzip",engine='fastparquet', index=False)
And the file is succesfully uploaded to s3 using:
os.environ['AWS_PROFILE'] = profile
s3 = boto3.client('s3')
response = s3.upload_file(path, 'bucket', "folder/"+path)