Questions tagged [amazon-redshift-spectrum]

Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.

291 questions
5
votes
1 answer

Unload multiple files from Redshift to S3

Hi I am trying to unload multiple tables from Redshift to a particular S3 bucket getting below error: psycopg2.InternalError: Specified unload destination on S3 is not empty. Consider using a different bucket / prefix, manually removing the target…
5
votes
2 answers

Skipping header rows in AWS Redshift External Tables

I have a file in S3 with the following data: name,age,gender jill,30,f jack,32,m And a redshift external table to query that data using spectrum: create external table spectrum.customers ( "name" varchar(50), "age" int, "gender" varchar(1)) row…
fez
  • 1,726
  • 3
  • 21
  • 31
4
votes
2 answers

Getting Spectrum Scan Error code 15007 on select query on redshift external table

I have created a external table in redshift spectrum.Upon running the select * from table_name, i am getting following error SQL Error [XX000]: ERROR: Spectrum Scan Error Detail: ----------------------------------------------- error: …
nat
  • 557
  • 2
  • 11
  • 25
4
votes
1 answer

Grant only access to View in Redshift Spectrum

I created a simple view over an external table on Redshift Spectrum: CREATE VIEW test_view AS ( SELECT * FROM my_external_schema.my_table WHERE my_field='x' ) WITH NO SCHEMA BINDING; Reading the documentation, I see that is not possible to give…
4
votes
2 answers

[XX000][500310] [Amazon](500310) Invalid operation: Parsed manifest is not a valid JSON object

I'm running a crawler over a folder containing several files with different schemas. I expect so to find a table for each file. What happens is that in the Glue Catalogue I can actually see a table for each file, with its own schema. But when I try…
4
votes
1 answer

Error trying to access Amazon Redshift external table

I have avro files in S3 which I want to be able to query via Redshift. Have used external tables with success in the past but only in parquet/JSON format so wondering whether I'm missing something with the data being in avro format maybe. I set up…
4
votes
1 answer

Redshift spectrum : how to import only certain files

When using redshift spectrum, it seems you can only import data providing location until a folder, and it imports all the files inside the folder. Is there a way to import import only one file from inside a folder with many files. When providing…
4
votes
0 answers

AWS Glue skipping folder

I have a process that stores data to S3, transforms the data and converts the data to Parquet, to be queried through Redshift Spectrum. I have a Glue crawler that crawls my dataset, and I use three partitions: year, month, day. All my files are…
4
votes
2 answers

How to generate 12 digit unique number in redshift?

I have 3 columns in a table i.e. email_id, rid, final_id. Rules for rid and final_id: If the email_id has a corresponding rid, use rid as the final_id. If the email_id does not have a corresponding rid(i.e.rid is null), generate a unique 12 digit…
user8147906
4
votes
2 answers

Remove double quotes " while loading data to Amazon Redshift Spectrum

I want to load data to amazon redshift external table. Data is in CSV format and has quotes. Do we have something like REMOVEQUOTES which we have in copy command for redshift external tables. Also what are different options to load fixed length…
4
votes
1 answer

AWS Redshift Spectrum - how to get the s3 filenames in the external table

I have external tables created in AWS spectrum to query the s3 data however i am not able to identify the filenames which the record belongs to(i have thousands of files under a bucket) In AWS Athena we have a pseudo column "$PATH" which will…
3
votes
1 answer

AWS Redshift Spectrum when accessing files in S3 Glacier deep archive

We have set up AWS Redshift external table accessing S3 using Spectrum. Due to the huge data amount, we decided to change S3 storage class for files older than 30 days to storage class S3 Glacier Deep Archive using Lifecycle policy. I couldn't find…
Edgars T.
  • 947
  • 8
  • 14
3
votes
1 answer

Redshift-Postgres RDS federated query: Authentication method 10 not supported

VPC is configured, secret is in Secrets Manager with correct policy attached to Redshift cluster. Created external schema using CREATE EXTERNAL SCHEMA schema_ext FROM POSTGRES DATABASE 'db' SCHEMA 'schema' URI…
3
votes
2 answers

How to query an array field (AWS Glue)?

I have a table in AWS Glue, and the crawler has defined one field as array. The content is in S3 files that have a json format. The table is TableA, and the field is members. There are a lot of other fields such as strings, booleans, doubles, and…
mrc
  • 2,845
  • 8
  • 39
  • 73
3
votes
2 answers

"Spectrum nested query error" Redshift error

When I run this query in Redshift: select sd.device_id from devices.s_devices sd left join devices.c_devices cd on sd.device_id = cd.device_id I get an error like this: ERROR: Spectrum nested query error DETAIL: …
del
  • 6,341
  • 10
  • 42
  • 45
1
2
3
19 20