Merging a list of numpy arrays into one array (fast)

Question

what would be the fastest way to merge a list of numpy arrays into one array if one knows the length of the list and the size of the arrays, which is the same for all?

I tried two approaches:

merged_array = array(list_of_arrays) from Pythonic way to create a numpy array from a list of numpy arrays and
vstack

A you can see vstack is faster, but for some reason the first run takes three times longer than the second. I assume this caused by (missing) preallocation. So how would I preallocate an array for vstack? Or do you know a faster methode?

Thanks!

[UPDATE]

I want (25280, 320) not (80, 320, 320) which means, merged_array = array(list_of_arrays) wont work for me. Thanks Joris for pointing that out!!!

Output:

0.547468900681 s merged_array = array(first_list_of_arrays)
0.547191858292 s merged_array = array(second_list_of_arrays)
0.656183958054 s vstack first
0.236850976944 s vstack second

Code:

import numpy
import time
width = 320
height = 320
n_matrices=80

secondmatrices = list()
for i in range(n_matrices):
    temp = numpy.random.rand(height, width).astype(numpy.float32)
    secondmatrices.append(numpy.round(temp*9))

firstmatrices = list()
for i in range(n_matrices):
    temp = numpy.random.rand(height, width).astype(numpy.float32)
    firstmatrices.append(numpy.round(temp*9))


t1 = time.time()
first1=numpy.array(firstmatrices)
print time.time() - t1, "s merged_array = array(first_list_of_arrays)"

t1 = time.time()
second1=numpy.array(secondmatrices)
print time.time() - t1, "s merged_array = array(second_list_of_arrays)"

t1 = time.time()
first2 = firstmatrices.pop()
for i in range(len(firstmatrices)):
    first2 = numpy.vstack((firstmatrices.pop(),first2))
print time.time() - t1, "s vstack first"

t1 = time.time()
second2 = secondmatrices.pop()
for i in range(len(secondmatrices)):
    second2 = numpy.vstack((secondmatrices.pop(),second2))

print time.time() - t1, "s vstack second"

Use [`timeit`](http://docs.python.org/library/timeit.html) to do simple performance testing in Python. It produce more accurate results. — Björn Pollex, May 17 '11 at 12:45
What dimensions you want the merged array to have? Because ``first1`` is ``(80, 320, 320)`` and ``first2`` is ``(25280, 320)`` — joris, May 17 '11 at 13:02
@joris, thanks for pointing that out. I want the second one, which was my initial approach. I will change it in the question. — Framester, May 17 '11 at 13:06
Then you need ``vstack`` instead of ``dstack`` from eumiro's answer. — joris, May 17 '11 at 13:10

eumiro · Accepted Answer · 2011-05-17T13:09:11.497

22

You have 80 arrays 320x320? So you probably want to use dstack:

first3 = numpy.dstack(firstmatrices)

This returns one 80x320x320 array just like numpy.array(firstmatrices) does:

timeit numpy.dstack(firstmatrices)
10 loops, best of 3: 47.1 ms per loop


timeit numpy.array(firstmatrices)
1 loops, best of 3: 750 ms per loop

If you want to use vstack, it will return a 25600x320 array:

timeit numpy.vstack(firstmatrices)
100 loops, best of 3: 18.2 ms per loop

edited May 17 '11 at 13:09

answered May 17 '11 at 12:59

eumiro

207,213
34
299
261

Hi eurmiro, sorry my question was unclear. I actually need (25280, 320) and not (80, 320, 320). See update of my question. – Framester May 17 '11 at 13:11

Merging a list of numpy arrays into one array (fast)

1 Answers1