Questions tagged [amazon-redshift-spectrum]

Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.

291 questions
3
votes
2 answers

redshift Unload operation causing redundant data

We use UNLOAD commands to run some transformation on s3-based external tables and publish data into a different s3 bucket in PARQUET format. I use ALLOWOVERWRITE option in the unload operation to replace the files if they already exist. This works…
Abhi
  • 1,153
  • 1
  • 23
  • 38
3
votes
2 answers

AWS Redshift: FATAL: connection limit "500" exceeded for non-bootstrap users

Hope you're all okay. We hit this limit quite often. We know there is no way to up the 500 limit of concurrent user connections in Redshift. We also know certain views (pg_user_info) provide info as to the user's actual limit. We are looking for…
geekjimbo
  • 63
  • 1
  • 10
3
votes
1 answer

Query Hive view with Redshift Spectrum

I'm trying to query a Hive view with Redshift Spectrum but it gives me this error: SQL Error [500310] [XX000]: [Amazon](500310) Invalid operation: Assert Details: ----------------------------------------------- error: Assert code: 1000 …
Pierre
  • 938
  • 1
  • 15
  • 37
3
votes
2 answers

Redshift spectrum shows NULL values for all rows

When I run this query in Athena query editor, it works as expected. SELECT * FROM "sampledb"."elb_logs" limit 10; elb_logs table has been generated based on the official tutorial. When I try to use spectrum in redshift, I can see all "NULL" values…
shantanuo
  • 31,689
  • 78
  • 245
  • 403
3
votes
5 answers

Redshift Spectrum: Query Anonymous JSON array structure

I have a JSON array of structures in S3, that is successfully Crawled & Cataloged by Glue. [{"key":"value"}, {"key":"value"}] I'm using the custom Classifier: $[*] When trying to query from Spectrum, however, it returns: Top level Ion/JSON…
3
votes
2 answers

How to show Redshift Spectrum (external schema) GRANTS?

This post is useful to show Redshift GRANTS but doesn't show GRANTS over external tables / schema. How to show external schema (and relative tables) privileges?
3
votes
3 answers

data appears as null on redshift external table while working right on athena

So I'm trying to run the following simple query on redshift spectrum: select * from company.vehicles where vehicle_id is not null and it return 0 rows(all of the rows in the table are null). However when I run the same query on athena it works fine…
3
votes
1 answer

Cost control in Redshift Spectrum when scanning external tables (S3 data)

Athena has some default service limits that can help ~ cap the cost from accidental "runaway" queries on a large data lake in S3. They are not great (based on ~ time, not volume of data scanned), but it's still helpful. What about Redshift…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
3
votes
1 answer

ERROR while querying data on redshift - Error fetching stripe data

I'm trying to run the following query over an external table in redshift: select * from schema.table limit 10; and I get an error: [2018-06-20 12:03:14] [XX000][500310] Amazon Invalid operation: S3 Query Exception (Fetch) Details: error: S3…
Gal Itzhak
  • 449
  • 1
  • 7
  • 14
3
votes
3 answers

AWS Redshift - Failed to incorporate external table into local catalog

Having a problem with one of our external tables in redshift. We have over 300 tables in AWS Glue which have been added to our redshift cluster as an external schema called events. Most of the tables in events can be queries fine. But when querying…
Kevin Johnson
  • 820
  • 11
  • 24
3
votes
1 answer

AWS Redshift Script export

How can i save all redshift database ddl script into my local drive for my managing repository on bitbucket. It means that I want to export all ddl script from redshift database and want to save in same structure folder in local like databasename =>…
3
votes
2 answers

Inserts into Redshift using spark-redshift

I am trying to insert in Redshift data from S3 (parquet files). Doing it through SQLWorkbench it takes 46 seconds for 6 million rows. But doing it through the connector spark-redshift it takes about 7 minutes. I am trying it with more nodes and…
3
votes
4 answers

Performance issues with Redshift Spectrum

I am using Redhshift spectrum. I created an external table and uploaded a csv data file on S3 with around 5.5 million records. If fire a query on this external table, it is taking ~15 seconds whereas If I run same query on Amazon redshift, I was…
2
votes
0 answers

Apache Iceberg on Redshift Spectrum, is it possible?

I have seen here https://aws.amazon.com/about-aws/whats-new/2020/09/amazon-redshift-spectrum-adds-support-for-querying-open-source-apache-hudi-and-delta-lake/ that Redshift Spectrum has support for Hudi and Delta. We're using Iceberg right now as a…
2
votes
0 answers

Error when creating external table in Redshift Spectrum with dbt: cross-database reference not supported

I want to create an external table in Redshift Spectrum from CSV files. When I try doing so with dbt, I get a strange error. But when I manually remove some double quotes from the SQL generated by dbt and run it directly, I get no such error. First…
ardaar
  • 1,164
  • 9
  • 19
1 2
3
19 20