1

I am trying to find the column with maximum column-sum of a 2D matrix in numpy. For example:

Let A = [[1, 2, 3], [0, 1, 4], [0, 0, 1]]

The sums of each column is [1, 3, 8]. Therefore, 3rd column has the maximum column-sum.

While trying numpy.argmax and numpy.sort functions to accomplish this task, I was expecting argmax to be faster than sort ideally but they resulted in same running time.

a = np.random.rand(7000, 8000)
start_time = time.time()
for i in range(1000):
    np.sort(np.sum(a, axis = 0))
print(time.time() - start_time)

Above code runs in 33.29 seconds while the below code also runs in 34.33 seconds.

a = np.random.rand(7000, 8000)
start_time = time.time()
for i in range(1000):
    np.argmax(np.sum(a, axis=0))
print(time.time() - start_time)

Could you please let me know the potential reasons behind this? Is it something related to how I am solving the problem?

randomprime
  • 123
  • 4
  • Maybe use the [timeit](https://docs.python.org/3/library/timeit.html) module. – wwii Apr 28 '19 at 21:23
  • I get 0.1 seconds for `.argmax` and 3.6 seconds for `.sort` for a (7000,8000) array. **without** the summation. – wwii Apr 28 '19 at 21:30
  • Thank you very much. I think what you pointed out is that np.sum is the bottleneck here but that is essential because I need to compute the column sums. I will try to think of some other way maybe then. Thanks! – randomprime Apr 28 '19 at 21:34
  • `np.sum(a, axis=0)` takes about 0.1 seconds for a (7000,8000) array. – wwii Apr 28 '19 at 21:36

1 Answers1

2

Timing with the timeit module.

>>> from timeit import Timer
>>> import numpy as np
>>> a = np.random.random((7000,8000))
>>> loops = 3
>>> timer = Timer("np.sum(a, axis=0)", "from __main__ import a, np")
>>> timer.timeit(loops) / loops
0.10155341827648574
>>> timer = Timer("np.argmax(a)", "from __main__ import a, np")
>>> timer.timeit(loops) / loops
0.11956859843814982
>>> timer = Timer("np.sort(a)", "from __main__ import a, np")
>>> timer.timeit(loops) / loops
3.5973468146321466
>>> timer = Timer("np.sort(np.sum(a, axis=0))", "from __main__ import a, np")
>>> timer.timeit(loops) / loops
0.09826639265653132
>>> timer = Timer("np.argmax(np.sum(a, axis=0))", "from __main__ import a, np")
>>> timer.timeit(loops) / loops
0.09442937388683958
>>> 
wwii
  • 23,232
  • 7
  • 37
  • 77