I have two spark streams, in the first comes data related to products: their price to the supplier, the currency, their description, the supplier id. These data are enriched by the category, guessed by the analysis of the description and the price in dollars. Then they are saved in a parquet dataset.
The second stream contains data on the auctioning of these products, then the cost at which they were sold and the date.
Given the fact that a product can arrive in the first stream today and be sold in a year, how can I join the second stream with all the history contained in the parquet dataset of the first stream?
The result to be clear should be the average daily earnings per price range ...