5

I am trying to apply a softmax function to a numpy array. But I am not getting the desired results. This is the code I have tried:

 import numpy as np
 x = np.array([[1001,1002],[3,4]])
 softmax = np.exp(x - np.max(x))/(np.sum(np.exp(x - np.max(x)))
 print softmax

I think the x - np.max(x) code is not subtracting the max of each row. The max needs to be subtracted from x to prevent very large numbers.

This is supposed to output

 np.array([
    [0.26894142, 0.73105858],
    [0.26894142, 0.73105858]])

But I am getting:

np.array([
    [0.26894142, 0.73105858],
    [0, 0]])
kmario23
  • 57,311
  • 13
  • 161
  • 150
Pranay Aryal
  • 5,208
  • 4
  • 30
  • 41

5 Answers5

8

A convenient way to keep the axes that are consumed by "reduce" operations such as max or sum is the keepdims keyword:

mx = np.max(x, axis=-1, keepdims=True)
mx
# array([[1002],
#        [   4]])
x - mx
# array([[-1,  0],
#        [-1,  0]])
numerator = np.exp(x - mx)
denominator = np.sum(numerator, axis=-1, keepdims=True)
denominator
# array([[ 1.36787944],
#        [ 1.36787944]])
numerator/denominator
# array([[ 0.26894142,  0.73105858],
         [ 0.26894142,  0.73105858]])
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
4

My 5-liner (which uses scipy logsumexp for the tricky bits):

def softmax(a, axis=None):
    """
    Computes exp(a)/sumexp(a); relies on scipy logsumexp implementation.
    :param a: ndarray/tensor
    :param axis: axis to sum over; default (None) sums over everything
    """
    from scipy.special import logsumexp
    lse = logsumexp(a, axis=axis)  # this reduces along axis
    if axis is not None:
        lse = np.expand_dims(lse, axis)  # restore that axis for subtraction
    return np.exp(a - lse)

You may have to use from scipy.misc import logsumexp if you have an older scipy version.

Yibo Yang
  • 2,353
  • 4
  • 27
  • 40
2

EDIT. As of version 1.2.0, scipy includes softmax as a special function:

https://scipy.github.io/devdocs/generated/scipy.special.softmax.html

I wrote a very general softmax function operating over an arbitrary axis, including the tricky max subtraction bit. The function is below, and I wrote a blog post about it here.

def softmax(X, theta = 1.0, axis = None):
    """
    Compute the softmax of each element along an axis of X.

    Parameters
    ----------
    X: ND-Array. Probably should be floats. 
    theta (optional): float parameter, used as a multiplier
        prior to exponentiation. Default = 1.0
    axis (optional): axis to compute values along. Default is the 
        first non-singleton axis.

    Returns an array the same size as X. The result will sum to 1
    along the specified axis.
    """

    # make X at least 2d
    y = np.atleast_2d(X)

    # find axis
    if axis is None:
        axis = next(j[0] for j in enumerate(y.shape) if j[1] > 1)

    # multiply y against the theta parameter, 
    y = y * float(theta)

    # subtract the max for numerical stability
    y = y - np.expand_dims(np.max(y, axis = axis), axis)

    # exponentiate y
    y = np.exp(y)

    # take the sum along the specified axis
    ax_sum = np.expand_dims(np.sum(y, axis = axis), axis)

    # finally: divide elementwise
    p = y / ax_sum

    # flatten if X was 1D
    if len(X.shape) == 1: p = p.flatten()

    return p
Nolan Conaway
  • 2,639
  • 1
  • 26
  • 42
1

The x - np.max(x) code is not doing row-wise subtraction. Let's do it step-wise. First we will make a 'maxes' array by tiling or making a copy of the column:

maxes = np.tile(np.max(x,1), (2,1)).T

This will create a 2X2 matrix which will correspond to the maxes for each row by making a duplicate column(tile). After this you can do:

 x = np.exp(x - maxes)/(np.sum(np.exp(x - maxes), axis = 1))

You should get your result with this. The axis = 1 is for the row-wise softmax you mentioned in the heading of your answer. Hope this helps.

Pranay Aryal
  • 5,208
  • 4
  • 30
  • 41
1

How about this?

For taking max along the rows just specify the argument as axis=1 and then convert the result as a column vector(but a 2D array actually) using np.newaxis/None.

In [40]: x
Out[40]: 
array([[1001, 1002],
       [   3,    4]])

In [41]: z = x - np.max(x, axis=1)[:, np.newaxis]

In [42]: z
Out[42]: 
array([[-1,  0],
       [-1,  0]])

In [44]: softmax = np.exp(z) / np.sum(np.exp(z), axis=1)[:, np.newaxis]

In [45]: softmax
Out[45]: 
array([[ 0.26894142,  0.73105858],
       [ 0.26894142,  0.73105858]])

In the last step, again when you take sum just specify the argument axis=1 to sum it along the rows.

kmario23
  • 57,311
  • 13
  • 161
  • 150
  • 1
    You have to do the `[:, np.newaxis]` thing in the `softmax` line (44), too. With the given example you happen to get the right result, but that's essentially coincidence. (It works because the two row sums happen to have the same value, so it doesn't matter which way they are broadcast.) Try for example `x = [[1001, 1002], [1, 4]]` instead to get a wrong result. Or `x = [[1001, 1002, 1003], [2, 3, 4]]` to get an outright error. – Paul Panzer Apr 08 '17 at 23:37
  • @PaulPanzer Danke Schön! What's the best way to notice such bugs? It was so subtle for my understanding of NumPy – kmario23 Apr 10 '17 at 12:41
  • 1
    Don't use square arrays in your toy examples ;-] Seriously, that catches at least half of them for me. – Paul Panzer Apr 10 '17 at 13:33