0

I am implementing a Feature Engineering & Feature store solution with Feast on GCP. I am using Bigquery for offline storage. I have a question: say I have a feature on a user entity that does not change frequently (for example address). I of course intend to use Feast to build a training dataset and the point in time joint functionality. In that case I seem to have 2 options:

  • Saving at a given frequency, (lets say every hour) the address for all my users in the BQ table even if there is no change in the feature value compared to the previous one stored, having a lot of duplicates
  • Saving only changes in the features, with potentially important gaps and sparsity in the storage.

The second option seems the most adequate since we would not store too many duplicate data points. However I know there is an argument ttl on feast FeatureView object which in my understanding sets the number of days that feast will use to search for feature values when using get_historical_features. Thus for a data with large sparsity such as user location I may need to set a very high ttl value, which may have performance & cost impacts according to Feast documentation. What is the way to approach this problem please?

Martin Becuwe
  • 117
  • 2
  • 9

0 Answers0