I have a "private" Hive database filled with 24 tables of data populated by externally located parquets in part of a Spark data pipeline.
I have a "public" Hive database intended for public (downstream) usage with 24 views selecting content out of that "private" Hive database.
When a new batch pipeline run occurs, an entirely new "private" Hive database is created next to the first "private" database.
The 24 VIEWS in the "public" database are then altered to point at the individual tables in the new database.
During the time it takes the 24 views to update, a race condition exists:
A downstream process might start and read from two different views in the "public" database where one view has been updated to point at the newest "private" database while another view might still point at the older "private" database.
Since ACID transactions are not a thing for external Hive tables and views, how does one BEST reduce the surface area of this race condition such that it does not cause downstream readers to possibly read two views from two different private sources?