0

I have granular data stored in Redshift. I want an aggregate table created regularly. I'm seeking to use AWS Data Pipeline to do this.

Let's say for conversation that I have a table of all flights. I want to generate a table of airports and the number of flights originating there. But because this table is large (and maybe I want to join in stuff from other tables), rather than write out this aggregation in every spot, I decide to build a derived table from it.

I set up my Data Pipeline. It looks like so

enter image description here

It is running this SQL in the Create Table Sql field

CREATE TABLE IF NOT EXISTS data.airports (
 airport_id int not null
 ,flights int null);

I am able to save it (no errors), but after I activate it, that table never shows up. I've checked the (few) parameters involved, but nothing stands out as obviously wrong. Even so, I tweaked some but still no table.

What should I start looking?

ScottieB
  • 3,958
  • 6
  • 42
  • 60

1 Answers1

1

The first place to look is within the data pipeline, to make sure that there are no errors there. (I think you have done that step already)

You should now check the sql / load commands that have been run on redshift, to make sure there are some there as expected, and that they have succeeded.

To do this, one simple way is to open your cluster in aws console. from there you can click on the tabs "Queries" to see the sql that was run and "Loads" to see the copy commands etc that were run.

It is also possible to see the same details without using aws by running for Queries:

select query, trim(querytxt) as sqlquery
from stl_query
order by query desc limit 50;

for loads, to see the errors for the most recent load

select d.query, substring(d.filename,14,20), 
d.line_number as line, 
substring(d.value,1,16) as value,
substring(le.err_reason,1,48) as err_reason
from stl_loaderror_detail d, stl_load_errors le
where d.query = le.query
and d.query = pg_last_copy_id(); 
Jon Scott
  • 4,144
  • 17
  • 29