I have multiple arrays of the following kind:
import numpy as np
orig_arr = np.full(shape=(5,10), fill_value=1) #only an example, actual entries different
Every entry in the array above is a number to a dictionary containing further information, which is stored in an array;
toy_dict = {0:np.arange(13, 23, dtype=float), 1:np.arange(23, 33, dtype=float)}
My task is to replace the entries in the orig_arr
with the array stored in the dict (here it is the toy_dict
)
My current approach is a naive approach, but I am looking for faster approaches:
goal_arr = np.full(shape=(orig_arr.shape[0], orig_arr.shape[1], 10), fill_value=2, dtype=float)
for row in range(orig_arr.shape[0]):
for col in range(orig_arr.shape[1]):
goal_arr[row,col] = toy_dict[0] # actual replacement happens here
As you can see, I am using an intermediate step, creating a goal_arr
which has the desired shape.
My question: How can I add the third dimension in a faster way, what parts can I improve? Thanks in advance!
(Further question I have looked in: "Error: setting an array element with a sequence", Numpy append: Automatically cast an array of the wrong dimension, Append 2D array to 3D array, extending third dimension)
Edit: After mathfux' good answer, I tested his proposed code versus my code in terms of speed comparison for larger arrays (more realistic for my use case):
Imports:
import numpy as np
import time
first_dim = 50
second_dim = 20
depth_dim = 300
upper_count = 5000
toy_dict = {k:np.random.random_sample(size = depth_dim) for k in range(upper_count)}
My original version, after parameterization
start = time.time()
orig_arr = np.random.randint(0, upper_count, size=(first_dim, second_dim))
goal_arr = np.empty(shape=(orig_arr.shape[0], orig_arr.shape[1], depth_dim), dtype=float)
for row in range(orig_arr.shape[0]):
for col in range(orig_arr.shape[1]):
goal_arr[row,col] = toy_dict[orig_arr[row, col]]
end = time.time()
print(end-start)
Time: 0.008016824722290039
Now mathfux' kindly provided answer:
start = time.time()
orig_arr = np.random.randint(0, upper_count, size=(first_dim,second_dim))
goal_arr = np.empty(shape=(orig_arr.shape[0], orig_arr.shape[1], depth_dim), dtype=float)
a = np.array(list(toy_dict.values())) #do not know if it can be optimized
idx = np.indices(orig_arr.shape)
goal_arr[idx[0], idx[1]] = a[orig_arr[idx[0], idx[1]]]
end = time.time()
print(end-start)
Time: 0.015697956085205078
Interestingly, the advanced index is slower. I think this is due to the dict->list->array conversion which takes time.
Nevertheless, thank you for your answers.
Edit 2:
I ran the code with the list conversion not occurring in the second code block (but before):
Time: 0.002306699752807617
Now this supports my thesis. Since the toy_dict
will be created only once, the proposed solution is faster. Thanks.