I think another approach would be more elegant. I suggest not creating a bunch of tables, but saving all the data in one table with one additional column for date.
I think you already have a dataset which represents data for current day (e.g. input_data
).
The following transformation would add date
column to the ever-growing history
table so that you could always access data for any date.
from transforms.api import transform, Output, Input, incremental
from pyspark.sql import functions as F
@incremental(snapshot_inputs=['input_data'])
@transform(
input_data=Input("/path/to/snapshot/input"),
history=Output("/path/to/historical/dataset"),
)
def my_compute_function(input_data, history):
input_df = input_data.dataframe()
input_df = input_df.withColumn('date', F.current_date())
history.write_dataframe(input_df)
I took most of the code from Foundry documentation. Try searching "Create a historical dataset from snapshots" in your system.