3

Can someone show me an example of tf.data.experimental.group_by_reducer? I find the documentation tricky and couldn't understand fully.

How can I use it for calculating average?

user3285099
  • 517
  • 1
  • 3
  • 13

3 Answers3

4

Say we are provided with a dataset with ['ids', 'features'] and we want to group the data by adding 'features' corresponding to same 'ids'. We can use tf.group_by_reducer(key_func, reducer) to achieve this.

Raw data

ids | features
--------------
1   | 1
2   | 2.2
3   | 7
1   | 3.0
2   | 2
3   | 3

Desired data

ids | features
--------------
1   | 4
2   | 4.2
3   | 10

TensorFlow Code:

import tensorflow as tf
tf.enable_eager_execution()

ids = [1, 2, 3, 1, 2, 3]
features = [1, 2.2, 7, 3.0, 2, 3]

# Define reducer
# Reducer requires 3 functions - init_func, reduce_func, finalize_func. 
# init_func - to define initial value
# reducer_func - operation to perform on values with same key
# finalize_func - value to return in the end.
def init_func(_):
    return 0.0

def reduce_func(state, value):
    return state + value['features']

def finalize_func(state):
    return state

reducer = tf.contrib.data.Reducer(init_func, reduce_func, finalize_func)

# Group by reducer
# Group the data by id
def key_f(row):
return tf.to_int64(row['ids'])

t = tf.contrib.data.group_by_reducer(
        key_func = key_f,
        reducer = reducer)

ds = tf.data.Dataset.from_tensor_slices({'ids':ids, 'features' : features})
ds = ds.apply(t)
ds = ds.batch(6)

iterator = ds.make_one_shot_iterator()
data = iterator.get_next()
print(data)

Consider ids == 1. We set our initial value to 0 using init_func. The reducer_func will perform 0 + 1 and 1 + 3.0 operation and finalize_func will return 4.0.

In group_by_reducer function, key_func is a function which returns a key for that data row. Key should be Int64. In our case, we use 'ids' as our key.

Illuminati0x5B
  • 602
  • 7
  • 24
  • I already have the data in the form of tensor. How can I load that into tf.data.Dataset to do the transformation? – raju May 31 '19 at 06:14
  • @raju you can use tf.data.Dataset.from_tensor_slices to load tensor. You have define appropriate key function and reducer. – Illuminati0x5B May 31 '19 at 17:14
  • Can you also answer https://stackoverflow.com/questions/56399448/tf-group-by-reducer-example-with-list-in-the-reducer ? @Illuminati0x5B – user3285099 May 31 '19 at 17:32
  • How would you compute average with group_by_reducer? – user3285099 May 31 '19 at 22:44
  • I was thinking of sending 2 values to finalize function one for sum and one for the number of elements but the syntax doesn't seem to support. – user3285099 Jun 01 '19 at 02:45
  • I haven't tried it yet but on top of my mind, set 'state' as (2,1) shaped array. Store the sum at index 0 and count at index 1. In finalize func, divide sum by count and return that value. – Illuminati0x5B Jun 01 '19 at 03:10
  • @user3285099 I have replied to your other question. – Illuminati0x5B Jun 04 '19 at 02:18
  • @Illuminati0x5B can group_by_reducer or group_by_window also be used within a custom loss function? Or are they only meant to be used on the data before it is passed into the neural network for custom batches? I'm trying to figure out how to do a group by in [this question I asked here](https://stackoverflow.com/questions/63927188/keras-custom-loss-function-per-tensor-group). I would appreciate any thoughts you may have, thanks! – DataMan Sep 21 '20 at 22:11
1

I tweaked @Illuminati0x5B code to work with tf2.0. Thanks @Illuminati0x5B, your sample code is really helpful.

TensorFlow Code(tweaked):

ids = [1, 2, 3, 1, 2, 3]
features = [1, 2.2, 7, 3.0, 2, 3]

# Define reducer
# Reducer requires 3 functions - init_func, reduce_func, finalize_func. 
# init_func - to define initial value
# reducer_func - operation to perform on values with same key
# finalize_func - value to return in the end.
def init_func(_):
    return 0.0

def reduce_func(state, value):
    return state + value['features']

def finalize_func(state):
    return state

reducer = tf.data.experimental.Reducer(init_func, reduce_func, finalize_func)

# Group by reducer
# Group the data by id
def key_f(row):
  return tf.dtypes.cast(row['ids'], tf.int64)

t = tf.data.experimental.group_by_reducer(
        key_func = key_f,
        reducer = reducer)

ds = tf.data.Dataset.from_tensor_slices({'ids':ids, 'features' : features})
ds = ds.apply(t)
ds = ds.batch(6)

iterator = tf.compat.v1.data.make_one_shot_iterator(ds)
data = iterator.get_next()
print(data)
0

I've changed @Illuminati0x5B, @VigneshKumar code to calculate the average with tf2.0.

ids = [1, 2, 3, 1, 2, 3]
features = [1, 2.2, 7, 3.0, 2, 3]

# Define reducer
# Reducer requires 3 functions - init_func, reduce_func, finalize_func. 
# init_func - to define initial value
# reducer_func - operation to perform on values with same key
# finalize_func - value to return in the end.
def init_func(_):
    return (0.0, 0.0)

def reduce_func(state, value):
    return (state[0] + value['features'], state[1] + 1)

def finalize_func(s, n):
    return s / n

reducer = tf.data.experimental.Reducer(init_func, reduce_func, finalize_func)

# Group by reducer
# Group the data by id
def key_f(row):
  return tf.dtypes.cast(row['ids'], tf.int64)

t = tf.data.experimental.group_by_reducer(
        key_func = key_f,
        reducer = reducer)

ds = tf.data.Dataset.from_tensor_slices({'ids':ids, 'features' : features})
ds = ds.apply(t)
ds = ds.batch(6)

iterator = tf.compat.v1.data.make_one_shot_iterator(ds)
data = iterator.get_next()
print(data)
Andrey
  • 5,932
  • 3
  • 17
  • 35