34

I already split the data into test and training set into the different folder. Now I need to load the patient data. Each patient has 8 images.

def load_dataset(root_dir, split):
    """
    load the data set numpy arrays saved by the preprocessing script
    :param root_dir: path to input data
    :param split: defines whether to load the training or test set
    :return: data: dictionary containing one dictionary ({'data', 'seg', 'pid'}) per patient
    """
    in_dir = os.path.join(root_dir, split)
    data_paths = [os.path.join(in_dir, f) for f in os.listdir(in_dir)]
    data_and_seg_arr = [np.load(ii, mmap_mode='r') for ii in data_paths]
    pids = [ii.split('/')[-1].split('.')[0] for ii in data_paths]
    data = OrderedDict()
    for ix, pid in enumerate(pids):
        data[pid] = {'data': data_and_seg_arr[ix][..., 0], 'seg': data_and_seg_arr[ix][..., 1], 'pid': pid}
    return data

But, the error said:

File "/home/zhe/Research/Seg/heart_seg/data_loader.py", line 61, in load_dataset
data_and_seg_arr = [np.load(ii, mmap_mode='r') for ii in data_paths]
File "/home/zhe/Research/Seg/heart_seg/data_loader.py", line 61, in <listcomp>
data_and_seg_arr = [np.load(ii, mmap_mode='r') for ii in data_paths]
File "/home/zhe/anaconda3/envs/tf_env/lib/python3.6/site-packages/numpy/lib/npyio.py", line 372, in load
fid = open(file, "rb")
IsADirectoryError: [Errno 21] Is a directory: './data/preprocessed_data/train/Patient009969'

It is already a file name, not a directory. Thanks!

Jo_
  • 525
  • 1
  • 5
  • 10
  • * The data_paths is a directory. what I want is to load the images(data_and_seg_arr) in the Patient folder(data_paths). But the error said the data_paths should be a file, not a directory. – Jo_ Sep 14 '18 at 20:56

3 Answers3

16

It seems that ./data/preprocessed_data/train/Patient009969 is a directory, not a file.

os.listdir() returns both files and directories.

Maybe try using os.walk() instead. It treats files and directories separately, and can recurse inside the subdirectories to find more files in a iterative way:

data_paths = [os.path.join(pth, f) 
    for pth, dirs, files in os.walk(in_dir) for f in files]
nosklo
  • 217,122
  • 57
  • 293
  • 297
  • Thank you so much. I was wrong. the data_paths is a directory. what I want is to load the image0~7(data_and_seg_arr) in the Patient folder(data_paths). But the error said the data_paths should be a file. – Jo_ Sep 14 '18 at 20:55
  • @Zhuo if you use the code provided in my answer, it should work – nosklo Sep 16 '18 at 23:02
9

Do you have both files and directories inside your path? os.listdir will list both files and directories, so when you try to open a directory with np.load it will give that error. You can filter only files to avoid the error:

data_paths = [os.path.join(in_dir, f) for f in os.listdir(in_dir)]
data_paths = [i for i in data_paths if os.path.isfile(i)]

Or all together in a single line:

data_paths = [i for i in (os.path.join(in_dir, f) for f in os.listdir(in_dir)) if os.path.isfile(i)]
solarc
  • 5,638
  • 2
  • 40
  • 51
  • Thank you so much. I was wrong. the data_paths is a directory. what I want is to load the images(data_and_seg_arr) in the Patient folder(data_paths). But the error said the data_paths should be a file – Jo_ Sep 14 '18 at 21:01
3

I had the same problem but i resolved by changing my path from Data/Train_Data/myDataset/(my images) to Data/Train_Data/(my images) where the script python is in the same path as Data. Hope this help.

ah bon
  • 9,293
  • 12
  • 65
  • 148
Sirine Attia
  • 131
  • 1
  • 3