0

I have a matrix array of 3D brain images which I am doing some processing for these images.

The input matrix looks like M[X, Y]: where X is the brain id and Y is the data which I am reshape it later to make some enhancement for

The following sequential code do it perfectly:

def transform(X):
 data = np.reshape(X, (-1, 176, 208, 176))
 data_cropped = np.empty((data.shape[0], 90, 100, 70))
 for idx in range(0, data.shape[0]):
    data_cropped[idx, :, :, :] = data[idx, 40:130, 40:140, 50:120]

 data_cropped = perm(data_cropped)
 #data_cropped = impute_data(data_cropped)
 data_cropped = np.reshape(data_cropped, (data_cropped.shape[0], -1))
 #data_cropped = data_cropped[:, np.apply_along_axis(np.count_nonzero, 0, data_cropped) != 0]

 return data_cropped


X_train = np.load("./data_original/X_train.npy")
X_crop = transform(X_train)

The output of this code portion when running sequentially (normal for loop) is:

brain: 0

brain: 1

brain: 2

brain: 3

...

The problem is that it takes very long time (around 60 min) to process all the brains.

I was trying to make the code running in parallel but I am unable to process all brains! Only brain 0 is being processed multiple times.

There is my try to parallelize the code:

num_cores = multiprocessing.cpu_count()
X_train = np.load("./data_original/X_train.npy")
X_crop = Parallel(n_jobs=num_cores)(delayed(transform)(i) for i in X_train)

But I got this result:

brain: 0

brain: 0

brain: 0

brain: 0

...

Any idea how to solve this problem? Thanks

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152
Khaled
  • 345
  • 5
  • 14
  • Are you sure the same set of data is processed? How is the number in the line "brain: " generated? – ivan_pozdeev Oct 14 '17 at 09:39
  • Yes, each brain has same data, which means if I do X_train.shape then I will get 278, 6443008 – Khaled Oct 14 '17 at 09:42
  • I think the problem is in this statement but I don't know how to fix it: Parallel(n_jobs=num_cores)(delayed(transform)(i) for i in X_train) – Khaled Oct 14 '17 at 09:45
  • The brain is generated in a for loop in perm(data_cropped) function after the preprocessing on this brain image has been done – Khaled Oct 14 '17 at 09:46

1 Answers1

0

You need to

  • split your data appropriately between the jobs AND
  • provide your worker code the information to correctly produce displayed brain indices.

for i in X_train produces rows of X_train (along the first dimension), one at a time, and they have one dimension less than the initial array:

In [7]: a=np.random.random((2,10))

In [10]: a.shape
Out[10]: (2, 10)

In [11]: [i.shape for i in a]
Out[11]: [(10,), (10,)]

Since you didn't give all the sample code to reproduce the issue, I cannot say what shape your processing code expects.


Then, apparently, the number after "brain:" is the index of a row in an input. If you feed each job a part of the input array, naturally, they will all produce the same indices. You need to somehow tell each job its staring index so that they calculate absolute indices correctly.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152