Spectrum Scan Error copy parquet file Redshift

Question

I have a copy statement from a parquet file such as:

COPY schema.table
FROM 's3://bucket/folder/'
IAM_ROLE 'MyRole'
FORMAT AS PARQUET ;

The MyRole policy is:

resource "aws_iam_policy" "PolicyMyRole" {
  name = "MyRole"
  path = "/"
  policy = <<EOF
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::other/*",
                "arn:aws:s3:::folder"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:*"
            ],
            "Resource": [
                "arn:aws:s3:::folder/*",
                "arn:aws:s3:::folder"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "kinesis:*"
            ],
            "Resource": "*"
        }
    ]
}
EOF
}

The copy returns an error such as:

sqlalchemy.exc.InternalError: (psycopg2.InternalError) Spectrum Scan Error
DETAIL:  
  -----------------------------------------------
  error:  Spectrum Scan Error
  code:      15001
  context:   
Error: HTTP response error code: 403 Message: AccessDenied Access Denied
x-amz-request-id: 9A5F3F8BB1C6AD5C
x-amz-id-2: 1JwcGdQFUJMec7s97plTFEvaw0EldAsDnYrg56bTpz/QVzbclIiVf/bK4ynGF/T7VNJIcf01PbQ=

  query:     20027980
  location:  dory_util.cpp:929
  process:   fetchtask_thread [pid=527]
  -----------------------------------------------

The parquet file is created using pandas Dataframe:

df.to_parquet(fname=path,compression="gzip",engine='fastparquet', index=False)

And the file is succesfully uploaded to s3 using:

os.environ['AWS_PROFILE'] = profile
s3 = boto3.client('s3')
response = s3.upload_file(path, 'bucket', "folder/"+path)

Is the bucket and the redshift cluster in the same AWS account? Is the writer of the data (pandas script) using the credentials form the same AWS account as the Redshift cluster? — botchniaque, Jun 04 '20 at 09:15
@Maik, how did you solve the ACL issue? I am having similar issues — pablosjv, Oct 20 '20 at 14:22
@pablosjv try to change the folder default permissions. If the ACL did not worked fine then change default grants — mrc, Oct 20 '20 at 15:24
I was getting the same error but I was getting it while trying to execute - "SELECT schemaname, tablename FROM SVV_EXTERNAL_TABLES;" I found out the issue by executing the query in svl_s3log doc (https://stackoverflow.com/a/50951575/7345709) My issue was I had to add AmazonS3ReadOnlyAccess to my RedshiftSpectrumRole — Roshin Jay, Apr 05 '21 at 20:02

Spectrum Scan Error copy parquet file Redshift

0 Answers0