I'm interested in setting up some automated jobs that will periodically export data from our Redshift instance and store it on S3, where ideally it will then be bubbled back up into Redshift via an external table running in Redshift Spectrum. One thing I'm not sure of how to best deal with is the case of certain tables I'm working with changing in schema over time.
I'm able to both UNLOAD data from Redshift to S3 without a problem, and I'm also able to set up an external table within Redshift and have that S3 data available for querying. However, I'm not sure how to best deal with cases where our tables will change columns over time. For example, in the case of certain event data we capture through Segment, traits that get added will result in a new column on the Redshift table that won't have existed in previous UNLOADs. In Redshift, the column value for data that came in before the column existed will just result in NULL values.
What are best way to deal deal with this gradual change in data structure over time? If I just update the new fields in our external table will Redshift be able to deal with the fact that these fields don't necessarily exist on the older UNLOADs, or do I need to go some other route?