2

I am trying to load a database from my python code which contains a list of dictionaries. For each item of the list the dictionary contains the name of a file a sub-list which contains n different dictionaries which a file name and data which is a numpy matrix of size 40x40x3 and correspond to an image. I want inside a for loop store all those images in a numpy file which size Nx40x40x3.

for item in dataset: 
    print item["name"] # the name of the list
    print item["data"] # a list of dictionaries
    for row in item["data"]:
      print row["sub_name"] # the name of the image
      print row["sub_data"] # contains an numpy array (my image) 

How cam I construct a numpy array and add all the images?

Jose Ramon
  • 5,572
  • 25
  • 76
  • 152

2 Answers2

2

NumPy arrays have fixed sizes, so unless you know the size upfront you have to use something that can change sizes, like python lists.

import numpy as np

images = []

for item in dataset:
    for row in item["data"]:
        images.append(row["sub_data"]) # Add to list

images = np.array(images) # Convert list to np.array()
Aaron N. Brock
  • 4,276
  • 2
  • 25
  • 43
  • I think you meant something like "NumPy arrays have fixed sizes". They're not immutable. – user2357112 May 02 '18 at 17:22
  • 1
    Alternatively, you could first get the necessary size of the array, then fill it – P. Camilleri May 02 '18 at 17:31
  • Although, now that you've provided that link, the second answer reminds me that under very specific conditions, it is [technically possible](https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.resize.html) to resize some NumPy arrays. It's limited, its efficiency is unpredictable (because it depends on whether realloc has to copy), it's incompatible with PyPy, and it's overall a worse option than presizing the array or using a list, but the option technically exists. – user2357112 May 02 '18 at 17:42
  • In the end the size of the converted list is (N, ) instead of (N, 28x28x3). – Jose Ramon May 02 '18 at 17:54
  • @P.Camilleri you are correct, I added a different answer where I do it that way, good catch though! – Aaron N. Brock May 02 '18 at 17:55
  • 1
    @JoseRamon I believe that means your data isn't all the same shape, it worked using my example data. – Aaron N. Brock May 02 '18 at 17:59
  • @JoseRamon [Here's a repl.it of my example](https://repl.it/repls/VagueMajorClient) – Aaron N. Brock May 02 '18 at 18:00
  • Yes you are right I found out that one of the images had different size. – Jose Ramon May 02 '18 at 18:07
2

In order to do this you would either need to use a datatype that's size can be mutated as I did in my other answer or you could also figure out how many images you have before defining the array. (As suggested by @P.Camilleri)

Here's an example of that:

# Count nuber of images
idx_count = 0
for item in dataset:
    idx_count += len(item['data'])

# Create an empty numpy array that's Nx3x3
images = np.empty((count, 3, 3))

# Populate numpy array with images
idx = 0
for item in dataset:
    for row in item["data"]:
        images[idx] = row["sub_data"]
        idx += 1

print(images)

This has the advantage that you only allocate the space once, as apposed to using a python list where it's first added to the list then copied to a numpy array.

However, this is at the cost of having to iterate over the data twice.

(Note: Two separate answers so they can be rated separately as I'm not sure which solution is better.)

Aaron N. Brock
  • 4,276
  • 2
  • 25
  • 43
  • 1
    I was surprised, but you're right: https://meta.stackexchange.com/questions/25209/what-is-the-official-etiquette-on-answering-a-question-twice – P. Camilleri May 02 '18 at 18:02
  • Both methods - list append and insertion in a predefined array - are used and recommended. The timing differences tend to be small. – hpaulj May 02 '18 at 22:07