I have datetime feature, cluster number and coordinates in my tfrecord file. How to write a function inside the transform component of TFX to count the records for each cluster number based on a 30minutes time interval? OR
Is there any built-in function of TensorFlow for aggregation or counting the records based on datetime and cluster number features of tfrecord?
Something similar to this pandas function:
def aggregate_split(df):
geo_loc = df.iloc[:,1:3].values
year = df['tpep_pickup_datetime'].dt.year.unique()
n,m=geo_loc.shape
ones = np.ones((n,1))
df['counts'] = ones
#codes for aggregating demand by clustered zones
agg = df.groupby(['CLUSTER_kmeans40',pd.Grouper(key='tpep_pickup_datetime',freq='30Min')]).counts.sum()