0

Given the following arrays:

import numpy as np
from scipy.stats import mannwhitneyu

s1 = np.array([[1,2,3,4,5,6,7,8,0,10],[10,9,8,7,6,5,4,3,2,1]])
s2 = np.array([[1,11,3,7,5,6,7,8,0,10],[10,9,8,7,6,15,4,13,2,1]])

I want to run the Mann-Whitney(-Wilcoxon) U test once for each slice of the respective samples and have the results populate into one output array with one slice for the U statistic and the other for the p-value. I know I can run them individually like this:

r1 = mannwhitneyu(s1[0], s2[0])
r2 = mannwhitneyu(s1[1], s2[1])

Output:

MannwhitneyuResult(statistic=39.5, pvalue=0.2239039981060696)
MannwhitneyuResult(statistic=37.0, pvalue=0.17162432050520815)

Desired output:

array([39.5, 0.2239039981060696], [ 37.0, 0.17162432050520815])

I have tried np.apply_along_axis but the array argument only takes one input and I have 2. Also, I need the fastest solution possible as I'll be doing this over thousands of slices as part of a simulation.

Thanks in advance!

Dance Party
  • 3,459
  • 10
  • 42
  • 67

1 Answers1

1

You could use map(...), is the best choice, and quite faster than, np.apply_along_axis(...), as it uses a python loop internally, and some of a computationally expensive ops i.e. transpose(...) and view(...), so under usual circumstances even looping through Numpy array using python loop, would be faster.


import numpy as np
from scipy.stats import mannwhitneyu

s1 = np.array([[1,2,3,4,5,6,7,8,0,10],[10,9,8,7,6,5,4,3,2,1]])
s2 = np.array([[1,11,3,7,5,6,7,8,0,10],[10,9,8,7,6,15,4,13,2,1]])

idx = np.arange(len(s1))

def step(i):

  return [*mannwhitneyu(s1[i], s2[i])]

np.array(list(map(step, idx)))
4.Pi.n
  • 1,151
  • 6
  • 15
  • What does the asterisk do in the return command? I tried this with a custom function I made but had to remove the asterisk to get it to work. – Dance Party Feb 14 '21 at 23:03
  • 1
    Asterisk unpacks the iterable object, you might check the output array, shape if you remove it, `np.array(list(map(step, idx))).shape` – 4.Pi.n Feb 15 '21 at 13:01