15

How can I calculate the median value of a list in tensorflow? Like

node = tf.median(X)

X is the placeholder
In numpy, I can directly use np.median to get the median value. How can I use the numpy operation in tensorflow?

Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
Yingchao Xiong
  • 255
  • 1
  • 3
  • 10

3 Answers3

21

For calculating median of an array with tensorflow you can use the percentile function, since the 50th percentile is the median.

import tensorflow as tf
import tensorflow_probability as tfp
import numpy as np 

np.random.seed(0)   
x = np.random.normal(3.0, .1, 100)

median = tfp.stats.percentile(x, 50.0, interpolation='midpoint')

tf.Session().run(median)

The code above is equivalent to np.percentile(x, 50, interpolation='midpoint').

EliadL
  • 6,230
  • 2
  • 26
  • 43
7

edit: This answer is outdated, use Lucas Venezian Povoa's solution instead. It is simpler and faster.

You can calculate the median inside tensorflow using:

def get_median(v):
    v = tf.reshape(v, [-1])
    mid = v.get_shape()[0]//2 + 1
    return tf.nn.top_k(v, mid).values[-1]

If X is already a vector you can skip the reshaping.

If you care about the median value being the mean of the two middle elements for vectors of even size, you should use this instead:

def get_real_median(v):
    v = tf.reshape(v, [-1])
    l = v.get_shape()[0]
    mid = l//2 + 1
    val = tf.nn.top_k(v, mid).values
    if l % 2 == 1:
        return val[-1]
    else:
        return 0.5 * (val[-1] + val[-2])
BlueSun
  • 3,541
  • 1
  • 18
  • 37
  • Thanks for your help. The X I defined is a [None, 5] matrix since the size of the input data is unknown. How could I figure out this problem? – Yingchao Xiong May 07 '17 at 15:34
  • @YingchaoXiong do you want to calculate the median of the total matrix or along one of the dimensions? – BlueSun May 07 '17 at 15:47
  • Along of the dimensions. I had figured out this problem. The new problem is the size of matrix or how to define the value of m in your function. The size of placeholder is [None, 5]. In training part, I set the batch size as 10 ([10,5]), while the size will be [1,5] when I do prediction. How could I change the value of m based on the size of feed? Thank you so much!!! – Yingchao Xiong May 07 '17 at 17:18
  • @YingchaoXiong you can try using the dynamic shape: `tf.shape(v)` I am not sure if that will work in combination with top_k. Another way would be to make two networks that use the same weights (use a variable scope and set reuse=True for the 2nd network). Make the first network with a [10, 5] placeholder and the 2nd with a [1, 5] placeholder. – BlueSun May 09 '17 at 00:07
  • 1
    For `v = [1, 2, 3]` this gives `3`. Therefore you should add 1 to `m`: `m = v.get_shape()[0]//2 + 1`. But for a set with even number of values it is still wrong. For `v = [1, 2, 3, 4]`, the median should be usually the mean of the two middle elements: `2.5`. This is done correct in the second part of Lucas [answer](https://stackoverflow.com/a/47657076/7440933). – dexteritas Sep 21 '18 at 11:03
  • @dexteritas thanks for pointing it out, I fixed the code. I only applied median to very large lists, so I never cared about the mean of the two middle elements rule. – BlueSun Oct 12 '18 at 17:28
6

We can modify BlueSun's solution to be much faster on GPUs:

def get_median(v):
    v = tf.reshape(v, [-1])
    m = v.get_shape()[0]//2
    return tf.reduce_min(tf.nn.top_k(v, m, sorted=False).values)

This is as fast as (in my experience) using tf.contrib.distributions.percentile(v, 50.0), and returns one of the actual elements.

noname
  • 343
  • 4
  • 14