Questions tagged [amazon-redshift-spectrum]

Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.

291 questions
0
votes
0 answers

Unloading & reloading data between S3 and Redshift with schema changes

I'm interested in setting up some automated jobs that will periodically export data from our Redshift instance and store it on S3, where ideally it will then be bubbled back up into Redshift via an external table running in Redshift Spectrum. One…
0
votes
1 answer

Redshift Spectrum Query - Request ran out of memory in the S3 query layer

I am trying to execute a query with grouping on 26 columns. Data is stored in S3 in parquet format partitioned by day. Redshift Spectrum query is returning below error. I am not able to find any relevant documentation in aws regarding this. Request…
0
votes
1 answer

How to identify a person or id if it contains more than one row for a different column in SQL

I have a table in which a person contains same values multiple times in another column. For example: person product portal count indicator ----------------------------------------------- 1 10 5 2 y …
Shivam Tyagi
  • 306
  • 3
  • 3
0
votes
1 answer

AWS Glue: How to ETL non-scalar JSON with varying schemas

Objective I have an S3 folder full of json files with varying schemas, including arrays (a dynamodb backup, as it happens). However, while the schemas vary, all files contain some common elements, such as 'id' or 'name', as well as nested arrays of…
0
votes
1 answer

Spectrum Same External Table Shows in Multiple Schemas (svv_external_tables)

It's a really simple test actually. I create a couple external schemas and create an external table in one of the schemas and then querying svv_external_tables shows the table exists in ALL schemas!! What am I missing? create external schema…
0
votes
1 answer

how to view data catalog table in S3 using redshift spectrum

I created external schema for my database in aws glue. I can see the list of table but I cannot look into the json data. redshift throws me this errors. [Amazon](500310) Invalid operation: S3 Query Exception (Fetch) Details: …
beni
  • 103
  • 3
  • 11
0
votes
0 answers

Column names containing dots in Spectrum

I created a customers table with columns has account_id.cust_id, account_id.ord_id and so on. My create external table query was as follows: CREATE EXTERNAL TABLE spectrum.customers ( "account_id.cust_id" numeric, "account_id.ord_id" numeric ) row…
0
votes
2 answers

Presto equivalent for Redshift's PERCENTILE_DISC

Given a query below in Redshift: select distinct cast(joinstart_ev_timestamp as date) as session_date, PERCENTILE_DISC(0.02) WITHIN GROUP (ORDER BY join_time) over(partition by trunc(joinstart_ev_timestamp))/1000 as mini, median(join_time)…
Bhuvi007
  • 111
  • 1
  • 3
  • 11
0
votes
1 answer

Can I convert CSV files sitting on Amazon S3 to Parquet format using Athena and without using Amazon EMR

I would like to convert the csv data files that are right now sitting on Amazon S3 into Parquet format using Amazon Athena and push them back to Amazon S3 without taking any help from Amazon EMR. Is this possible to do it? Has anyone experienced…
0
votes
1 answer

How to create an external table for nested Parquet type in redshift spectrum

I know redshift and redshift spectrum doesn't support nested type, but I want to know is there any trick that we can bypass that limitation and query our nested data in S3 with Redshift Spectrum? In this post the guy shows how we can do it for JSON…
Am1rr3zA
  • 7,115
  • 18
  • 83
  • 125
0
votes
1 answer

How to load CDC into Redshift database?

Can anyone tell me CDC /incremental load methods in Redshift using SQL? I know one method upsert but other than this there are another methods to do like insert followed by delete etc..
0
votes
1 answer

Cannot connect to aws redshift

I created a redshift in aws console. the I went to cluster created and based on the information I got in the console I used them in SQL Workbench/J. To set up sql workbench/J I used the…
0
votes
2 answers

Spectrum in us-west-1 and Glue in us-west-2 is it possible?

I am using the Redshift Cluster in us-west-1 (NCAL) s3 file location is in us-west-1 (NCAL) Glue data catalog is in us-west-2 (Oregon) When I try to query the table select count(*) from spectrum_schema.table_name; I get the error below. [Code:…
0
votes
0 answers

How to specify row delimiter for Redshift Spectrum

I'm trying to mount csv files that have a CRLF as a row terminator, into Redshift Spectrum. However, it seems like I can only specify a single character as a row terminator. Does anyone know how to get around this?
0
votes
0 answers

data distribution in redshift for star schema model?

I have big fact table 2 billions rows and 19 dimensions ( product dimension is big 450 millions, another two dimensions are 100 millions each rest small dimensions table) Can some one help me on data distribution for this scenarios ?
1 2 3
19
20