0

I have a requirement where I need to join Redshift Information schema table with User defined table. I have tried below queries and getting mentioned errors.

select *
from pg_table_def a join user_defined_table b
on 1 = 1 -- condition just to give an example
where tablename = 'table1'; 

SQL Error [500310] [0A000]: [Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;

select *
from information_schema.columns  a join user_defined_table b
on 1 =1 -- condition just to give an example
where a.table_name = 'table1';

SQL Error [500310] [0A000]: [Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;

May be I am missing some basics, Please suggest.

Tajinder
  • 2,248
  • 4
  • 33
  • 54

1 Answers1

2

As you are finding catalog tables that live on the leader node are quite different than normal tables. The approach to using the data from these tables in conjunction with normal tables is to save the data from these catalog tables as normal tables. Then the data can be combined as you are trying.

There is no path from these tables directly to a normal table without selecting from the catalog (as far as I know). So you can select this data out of redshift and then put it back in as a normal table with a script or program. Or you can do the near equivalent of this by selecting into a cursor and reading the cursor into a normal table. This reading from a cursor can be done with a stored procedure. Let's look at this path first.

The code below creates a stored procedure that makes the normal table, "fred", selects pg_table_def data and puts it in fred, then executes the stored procedure, and finally selects what is stored in fred.

CREATE OR REPLACE procedure rewrite_data()
AS
$$
DECLARE 
  row record;
BEGIN
  drop table if exists fred;
  create table fred (schemaname varchar(256),tablename varchar(256),"column"varchar(256), "type"varchar(256));
  for row in select "schemaname"::text, "tablename"::text, "column"::text, "type"::text from pg_table_def where "schemaname" <> 'pg_catalog' LOOP
    INSERT INTO fred(schemaname,tablename,"column","type") VALUES (row.schemaname,row.tablename,row."column",row."type");
  END LOOP;
END;
$$ LANGUAGE plpgsql;
call rewrite_data();
select * from fred;

This procedure works fine but it is slow to loop through the cursor one row at a time, inserting one row onto fred as we go. This process is fine for small things but some of the catalog tables are fairly big. What is nice about this process is that the updating of fred can be done as a SQL call whenever it is needed. This process is very nice for small updates that need to be initiated by other SQL.

The speed of looping on lots of rows is what is bad about the cursor/stored-procedure method. 10's of seconds to loop on a 1000 row cursor. So if you need to have large amounts of leader-node-only catalog data copied to normal tables you will want something that looks more like an ETL process. Read the catalog table, write it to S3, and then COPY it to your tables. This is easy to do and will be fairly fast but cannot be initiated by a SQL command so the tables will be some amount of time out of date when they are used (these tables will be correct as of the last update time).

So there is no perfect solution but a couple of workable ones depending on your needs.

Bill Weiner
  • 8,835
  • 2
  • 7
  • 18