Using Amazon Redshift Spectrum, you can query and retrieve structured and semistructured data from files in Amazon S3 without having to load the data into Amazon Redshift tables. Redshift Spectrum queries employ massive parallelism to execute very fast against large datasets.Multiple clusters can concurrently query the same dataset in Amazon S3 without the need to make copies of the data for each cluster.
Questions tagged [amazon-redshift-spectrum]
291 questions
0
votes
2 answers
Join the table with incremental data of the same table
I am trying to implement a logic in Redshift Spectrum where my original table looks like below:
Records in the student table:
1 || student1 || Boston || 2019-01-01
2 || student2 || New York || 2019-02-01
3 || student3 || Chicago ||…

hadooper
- 726
- 1
- 6
- 18
0
votes
1 answer
Redshift External Table not handling Linefeed character within a field
I have an external table using Glue catalog and reading a CSV file. The fields are enclosed in double quotes if they have comma or a LF (LineFeed). I am able to read a field properly as a single value if there is delimiter within that field but the…

SwapSays
- 407
- 7
- 18
0
votes
1 answer
Redshift spectrum incorrectly parsing Pyarrow datetime64[ns]
I have an external table in Redshift spectrum with DDL having datetime column as somewhat below:
collector_tstamp TIMESTAMP WITHOUT TIME ZONE
Objective: I am trying to parquet a certain set of data and then add the partition into Spectrum to see if…

Gagan
- 1,775
- 5
- 31
- 59
0
votes
0 answers
to_char(to_date(string,),) not working properly
I have a table in which dates are stored in string format and I want to change its format from yyyyMMdd to yyyy-MM-dd and store it back in string format only.
I am using Aginity workbench.
Referred link - convert MM/DD/YYYY to YYYYMMDD in redshift ,…

user036
- 13
- 4
0
votes
1 answer
Redshift External Table showing NULL for actual timestamp values
I have created an external table using Glue catalog and trying to read a CSV file from S3. However, the three timestamp fields in my CSV file are all showing as NULL while the other values are shown as proper values.
I checked the serialization…

SwapSays
- 407
- 7
- 18
0
votes
1 answer
External Table and Database using AWS Glue catalog
Can I view the Glue catalog that is created/used for the External table I created using the "FROM DATA CATALOG" keyword while creating the External Schema?
I went to AWS Glue console and there is nothing under "Databases" or "Tables" option.
I…

SwapSays
- 407
- 7
- 18
0
votes
1 answer
Translate Spark Schema to Redshift Spectrum Nested Schema
Using Apache Spark on an EMR cluster, I have read in xml data, inferred the schema, and stored it on s3 in parquet format. It is now, essentially, a nested table.
Using Spark, I have the schema. I now want to be able to create an external table…

Eric
- 145
- 1
- 1
- 9
0
votes
0 answers
Redshift spectrum add partition having specific prefix in keys
I have keys in the following format:
s3://bucket/source/2019/01/01/xyz_20190101.csv
s3://bucket/source/2019/01/01/mno_20190101.csv
s3://bucket/source/2019/01/02/xyz_20190102.csv
s3://bucket/source/2019/01/02/mno_20190102.csv
But when i add…

Gagan
- 1,775
- 5
- 31
- 59
0
votes
1 answer
Getting a "Disk Full" error from Redshift Spectrum
I am facing the problem of frequent Disk Full error on Redshift Spectrum, as a result, I have to repeatedly scale up the cluster. It seems that the caching would be deleted.
Ideally, I would like the scaling up to keep the caching, and finding a way…

Minh Triet
- 1,190
- 1
- 15
- 35
0
votes
1 answer
Redshift Spectrum and Hive Metastore - Ambiguous Error
From Redshift, I created an external schema using the Hive Metastore. I can see the Redshift metadata about the tables (such as using: select * from SVV_EXTERNAL_TABLES), however when querying one of these tables, I get an ambiguous error "error: …

Eli Reiman
- 164
- 2
- 11
0
votes
0 answers
How to store one-to-many entities data in S3 for Amazon Redshift Spectrum
My requirement is to store data into S3 and perform queries on S3 data using Amazon Redshift Spectrum. My data is modeled with one-to-many and many-to-many. For example consider the following SQL schema
user (id, name)
user_phoes (id, phone_type,…

Achaius
- 5,904
- 21
- 65
- 122
0
votes
1 answer
How to creating an external table in redshift spectrum, where file location will change everyday?
We are planning to source data from another AWS account's S3 by using AWS redshift spectrum. But Source informed that bucket key will change every day and latest data will be available in the bucket key location with latest timestamp.
Can anyone…

Rajib Kar
- 21
- 3
0
votes
1 answer
how to specify s3 config for spectrify python package?
how to specify this s3_config object for python spectrify package ?
from spectrify.export import RedshiftDataExporter
RedshiftDataExporter(sa_engine, s3_config).export_to_csv('my_table')

muon
- 12,821
- 11
- 69
- 88
0
votes
0 answers
Redshift spectrum timestamp column issues
I have few files in s3. Used glue data catalog to get the table definition. I have field called log_time and I manually set the datatype to timestamp in glue catalog. Now when I query that table from Athena I can see the timestamp values correctly.…

Venkat.V.S
- 349
- 3
- 7
0
votes
1 answer
Amazon Spectrum incremental load directly from string
I have take a field as 'filename Pro_180913_171842' from spectrum.
Tried the function in sql like
`select
fields
from spectrum.ex
where cast(SPLIT_PART('filename Pro_180913_171842','Pro_',2)as
…

Ganesh Pitchai
- 55
- 8