2

Say that I have an external table in Hive and the csv file in the external table's S3 location looks like below.

+----+------+
| ID | Name |
+----+------+
|  1 | A    |
|  2 | B    |
+----+------+

If I change the data in the file like below, I am able to see the changed value when I query the external table in Hive.

+----+------+
| ID | Name |
+----+------+
|  1 | A    |
|  2 | C    |
+----+------+

The same scenario when I tested in Snowflake, I am not able to see the new data rather I can still see the old data, though I have added the auto_refresh = true while creating external table in Snowflake.

create or replace external table schema.table_name 
(
    ID INT as  (value:c1::int), 
    Name varchar(20) as ( value:c2::varchar)
)
with location = @ext_stage_test
file_format = pipeformat
auto_refresh = true

Is this the behavior of Snowflake or am I missing anything?

Any help is highly appreciated.

Sarath Subramanian
  • 20,027
  • 11
  • 82
  • 86
  • Have you verified the data in the files in your external stage location was updated with the values? Snowflake should always show the correct values based on the files in the external stage. – Mike Walton Oct 30 '20 at 03:32
  • Yes, in the S3 location the file have the updated value. But while I queries the table, it is showing the old data – Sarath Subramanian Oct 30 '20 at 03:36
  • What do you see when you run a select directly against the stage from within Snowflake? `SELECT $1, $2 FROM @ext_stage_test ( FILE_FORMAT => 'pipeformat' );` – Mike Walton Oct 30 '20 at 03:39
  • Also, did you setup the auto-refresh using AWS SNS per the following documentation: https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html#simple-external-table-auto-refresh-using-amazon-sns – Mike Walton Oct 30 '20 at 03:50
  • When I run the select query on stage, I am not seeing refreshed data. But when I create/replace the external table again, data is getting refresh. That shouldn't be the behavior, right? – Sarath Subramanian Oct 30 '20 at 05:58
  • Hi - have a read of the documentation that @MikeWalton linked to, especially the note after the AUTO_REFRESH = TRUE | FALSE parameter description. Setting AUTO_REFRESH = TRUE does not mean the table will be automatically refreshed when data in the external location is changed; it means the data will be refreshed when the appropriate trigger is received. Regarding your last comment, the parameter REFRESH_ON_CREATE = TRUE | FALSE controls whether the data is refreshed when you create the external table – NickW Oct 30 '20 at 14:36

1 Answers1

2

I'll place this as an answer, but it doesn't sound like you have setup the SNS notifications on AWS S3. The AUTO_REFRESH allows Snowflake to refresh an External Table when a notification is sent to a queue that a file has either been placed in your S3 bucket or removed. This is the only truly auto-refresh property available. Otherwise, there is a manual refresh of the external table that needs to be run to refresh the external table:

alter external table exttable_part refresh;

If you are unable to set the SNS configuration noted in the documentation (https://docs.snowflake.com/en/sql-reference/sql/create-external-table.html#simple-external-table-auto-refresh-using-amazon-sns), then I would recommend adding the command above to a task that executes on a frequency that makes sense for you.

Mike Walton
  • 6,595
  • 2
  • 11
  • 22