In your case, it would be beneficial to adopt a feature store approach to manage the evolving features of your ML model. A feature store is a centralized repository that serves as a single source of truth for all the features used in machine learning models. It enables efficient feature management, versioning, and retrieval for training and inference.
Instead of altering the schema of your existing BigQuery table every time a new feature is added, consider separating the feature extraction step from the input table. You can create a separate pipeline or process that extracts the features from your input data and stores them in a dedicated feature store.
Then set up a feature store to store and manage your extracted features. This can be implemented using a dedicated database or a specialized feature store tool. The feature store should allow you to version and organize features efficiently. It should also support easy retrieval of features based on timestamps or other relevant criteria.
Create a feature engineering pipeline that takes the raw input data, applies the necessary transformations, and extracts the features. This pipeline should populate the feature store with the latest features and ensure that they are properly versioned.
When training your ML model, you can fetch the required features from the feature store based on the desired version. This allows you to consistently use the same set of features for model training, regardless of the changes made to the input data schema.
Instead of recreating the entire table every time, design your feature engineering pipeline to handle incremental updates. This means that for each new batch of data, the pipeline should update only the necessary features in the feature store, rather than rebuilding the entire feature set. This can significantly reduce the processing time and allow for more frequent model predictions.
Then finally, try to implement a mechanism to track feature versions and manage deprecated features. This ensures that you can easily track which features were used for each training run and inference, and you can remove unused or deprecated features from the feature store to keep it clean and efficient.