Can someone show me an example of tf.data.experimental.group_by_reducer? I find the documentation tricky and couldn't understand fully.
How can I use it for calculating average?
Can someone show me an example of tf.data.experimental.group_by_reducer? I find the documentation tricky and couldn't understand fully.
How can I use it for calculating average?
Say we are provided with a dataset with ['ids', 'features']
and we want to group the data by adding 'features'
corresponding to same 'ids'
. We can use tf.group_by_reducer(key_func, reducer)
to achieve this.
Raw data
ids | features
--------------
1 | 1
2 | 2.2
3 | 7
1 | 3.0
2 | 2
3 | 3
Desired data
ids | features
--------------
1 | 4
2 | 4.2
3 | 10
TensorFlow Code:
import tensorflow as tf
tf.enable_eager_execution()
ids = [1, 2, 3, 1, 2, 3]
features = [1, 2.2, 7, 3.0, 2, 3]
# Define reducer
# Reducer requires 3 functions - init_func, reduce_func, finalize_func.
# init_func - to define initial value
# reducer_func - operation to perform on values with same key
# finalize_func - value to return in the end.
def init_func(_):
return 0.0
def reduce_func(state, value):
return state + value['features']
def finalize_func(state):
return state
reducer = tf.contrib.data.Reducer(init_func, reduce_func, finalize_func)
# Group by reducer
# Group the data by id
def key_f(row):
return tf.to_int64(row['ids'])
t = tf.contrib.data.group_by_reducer(
key_func = key_f,
reducer = reducer)
ds = tf.data.Dataset.from_tensor_slices({'ids':ids, 'features' : features})
ds = ds.apply(t)
ds = ds.batch(6)
iterator = ds.make_one_shot_iterator()
data = iterator.get_next()
print(data)
Consider ids == 1. We set our initial value to 0 using init_func
. The reducer_func
will perform 0 + 1
and 1 + 3.0
operation and finalize_func
will return 4.0.
In group_by_reducer function, key_func
is a function which returns a key for that data row. Key should be Int64. In our case, we use 'ids' as our key.
I tweaked @Illuminati0x5B code to work with tf2.0. Thanks @Illuminati0x5B, your sample code is really helpful.
TensorFlow Code(tweaked):
ids = [1, 2, 3, 1, 2, 3]
features = [1, 2.2, 7, 3.0, 2, 3]
# Define reducer
# Reducer requires 3 functions - init_func, reduce_func, finalize_func.
# init_func - to define initial value
# reducer_func - operation to perform on values with same key
# finalize_func - value to return in the end.
def init_func(_):
return 0.0
def reduce_func(state, value):
return state + value['features']
def finalize_func(state):
return state
reducer = tf.data.experimental.Reducer(init_func, reduce_func, finalize_func)
# Group by reducer
# Group the data by id
def key_f(row):
return tf.dtypes.cast(row['ids'], tf.int64)
t = tf.data.experimental.group_by_reducer(
key_func = key_f,
reducer = reducer)
ds = tf.data.Dataset.from_tensor_slices({'ids':ids, 'features' : features})
ds = ds.apply(t)
ds = ds.batch(6)
iterator = tf.compat.v1.data.make_one_shot_iterator(ds)
data = iterator.get_next()
print(data)
I've changed @Illuminati0x5B, @VigneshKumar code to calculate the average with tf2.0.
ids = [1, 2, 3, 1, 2, 3]
features = [1, 2.2, 7, 3.0, 2, 3]
# Define reducer
# Reducer requires 3 functions - init_func, reduce_func, finalize_func.
# init_func - to define initial value
# reducer_func - operation to perform on values with same key
# finalize_func - value to return in the end.
def init_func(_):
return (0.0, 0.0)
def reduce_func(state, value):
return (state[0] + value['features'], state[1] + 1)
def finalize_func(s, n):
return s / n
reducer = tf.data.experimental.Reducer(init_func, reduce_func, finalize_func)
# Group by reducer
# Group the data by id
def key_f(row):
return tf.dtypes.cast(row['ids'], tf.int64)
t = tf.data.experimental.group_by_reducer(
key_func = key_f,
reducer = reducer)
ds = tf.data.Dataset.from_tensor_slices({'ids':ids, 'features' : features})
ds = ds.apply(t)
ds = ds.batch(6)
iterator = tf.compat.v1.data.make_one_shot_iterator(ds)
data = iterator.get_next()
print(data)