0

I have datetime feature, cluster number and coordinates in my tfrecord file. How to write a function inside the transform component of TFX to count the records for each cluster number based on a 30minutes time interval? OR

Is there any built-in function of TensorFlow for aggregation or counting the records based on datetime and cluster number features of tfrecord?

Something similar to this pandas function:

    def aggregate_split(df):

      geo_loc = df.iloc[:,1:3].values
      year = df['tpep_pickup_datetime'].dt.year.unique()
      n,m=geo_loc.shape
      ones = np.ones((n,1))
      df['counts'] = ones

      #codes for aggregating demand by clustered zones
      agg = df.groupby(['CLUSTER_kmeans40',pd.Grouper(key='tpep_pickup_datetime',freq='30Min')]).counts.sum()
  • Please provide enough code so others can better understand or reproduce the problem. – Community Oct 20 '21 at 09:01
  • Can you take a look at this [tft.count_per_key](https://www.tensorflow.org/tfx/transform/api_docs/python/tft/count_per_key) which is for a similar functionality? Thanks! –  Mar 21 '22 at 05:30

0 Answers0