3

Having a problem with one of our external tables in redshift.

We have over 300 tables in AWS Glue which have been added to our redshift cluster as an external schema called events. Most of the tables in events can be queries fine. But when querying one of the tables called item_loaded we get the following error;

select * from events.item_loaded limit 1;
ERROR:  XX000: Failed to incorporate external table "events"."item_loaded" into local catalog.
LOCATION:  localize_external_table, /home/ec2-user/padb/src/external_catalog/external_catalog_api.cpp:358

What's weird is that they are in the catalog;

select *
from SVV_EXTERNAL_TABLES
where tablename = 'item_loaded';

-[ RECORD 1 ]-----+------------------------------------------
schemaname        | events
tablename         | item_loaded
location          | s3://my_bucket/item_loaded
input_format      | org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
output_format     | org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
serialization_lib | org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe 
serde_parameters  | {"serialization.format":"1"}
compressed        | 0
parameters        | {"EXTERNAL":"TRUE","parquet.compress":"SNAPPY","transient_lastDdlTime":"1504792238"}

AFAICT, this table is configured the exact same way as the other tables in the same schema which are working fine. I've tried recreating a new external schema pointing to the same AWS Glue database but the same issue occurs.

What else could I potentially check? Is there anything that could occur which would cause a table to removed from the catalog?

Kevin Johnson
  • 820
  • 11
  • 24
  • You can check on the Redshift side the following views: SVV_EXTERNAL_SCHEMAS, SVV_EXTERNAL_TABLES ,SVV_EXTERNAL_PARTITIONS, SVV_EXTERNAL_COLUMNS. And maybe from this as well: SVL_S3QUERY_SUMMARY – LauriK Apr 27 '18 at 16:11

3 Answers3

1

As per the forum post about the same:

The external table has a number of columns which exceed the Redshift limits:

  • 1,600 columns per table for local Redshift table
  • 1,598 columns for Redshift Spectrum external table

You can verify the number of columns of external table by querying svv_external_columns

Joe Harris
  • 13,671
  • 4
  • 47
  • 54
  • 1
    I have limited number of columns (50), still im facing this issue, what could be the problem? – DJo Jun 07 '18 at 10:18
  • Does the table have nested data columns? You can send me a private message with your cluster details via the Redshift forum (https://forums.aws.amazon.com/profile.jspa?userID=419029) and I will investigate further. – Joe Harris Jun 07 '18 at 16:08
  • There is no nested columns, its a direct select from the spectrum table – DJo Jun 08 '18 at 05:24
1

I very recently faced the problem,

In addition to the above solution, there are a few more threads as well

  1. https://forums.aws.amazon.com/message.jspa?messageID=845538&tstart=0 (Solution by Joe)
  2. https://forums.aws.amazon.com/thread.jspa?messageID=780552 (Says the fix is incorporated)
  3. I was facing this issue with the IAM role having AWS Glue Full Access. I deliberately added AthenaFullAccess as well and restarted the Redshift cluster which resolved the issue. Not sure what caused the issue and how it got resolved in this case
Mukund
  • 916
  • 2
  • 11
  • 18
0

it can also happen if there are typos in the config. for ex following fails:

SECRET_ARN ' arn:aws:secretsmanager:us-east-1:123:secret:stage/data/redshift-rds'

and following works

SECRET_ARN 'arn:aws:secretsmanager:us-east-1:123:secret:stage/data/redshift-rds'

Note additional space at the beginning of arn

Gaurav Shah
  • 5,223
  • 7
  • 43
  • 71