2

I am porting code to train a neural network. I wrote the code as part of an Udacity project and it worked fine in the Udacity environment.

Now I am porting the code to an Nvidia Jetson Nano running Ubuntu 18.04 and Python 3.6.8.

When iterating through the training data, somehow "._" sneakes into the file path prior the file name and issues an error message.

When I run the file, I get following error message:

Traceback (most recent call last):
  File "train_rev6.py", line 427, in <module>
    main()
  File "train_rev6.py", line 419, in main
    train_model(in_args)
  File "train_rev6.py", line 221, in train_model
    for inputs, labels in trainloader:
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 560, in __next__
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 560, in <listcomp>
    batch = self.collate_fn([self.dataset[i] for i in indices])
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/folder.py", line 132, in __getitem__
    sample = self.loader(path)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/folder.py", line 178, in default_loader
    return pil_loader(path)
  File "/usr/local/lib/python3.6/dist-packages/torchvision/datasets/folder.py", line 160, in pil_loader
    img = Image.open(f)
  File "/usr/local/lib/python3.6/dist-packages/PIL/Image.py", line 2705, in open
    % (filename if filename else fp))
OSError: cannot identify image file <_io.BufferedReader name='/home/mme/Documents/001_UdacityFinalProjectFlowersRev2/flowers/train/40/._image_04589.jpg'>

I suspect the error is due to the "._" prior the file name "image...", as this is not part of the file name and when I prompt

sudo find / -name image_00824.jpg

I get the correct path:

/home/mme/Documents/001_UdacityFinalProjectFlowersRev2/flowers/train/81/image_00824.jpg

without "._" prior the file name.

My issue here seems the same as in

OSError: cannot identify image file

(Adjusting and running from PIL import Image;Image.open(open("path/to/file", 'rb')) as suggested in the answer does not issue an error message.)

The file path is give in the command line:

python3 train_rev6.py --file_path "/home/mme/Documents/001_UdacityFinalProjectFlowersRev2/flowers" --arch "vgg16" --epochs 5 --gpu "gpu" --running_loss True --valid_loss True --valid_accuracy True --test True

The code below shows the two relevant functions.

Any idea how I get rid of this "._"?

def load_data(in_args):
    """
    Function to:
        - Specify diretories for training, validation and test set.
        - Define your transforms for the training, validation and testing sets.
        - Load the datasets with ImageFolder.
        - Using the image datasets and the trainforms, define the dataloaders.
        - Label mapping.
    """
    # Specify diretories for training, validation and test set.
    data_dir = in_args.file_path
    train_dir = data_dir + "/train"
    valid_dir = data_dir + "/valid"
    test_dir = data_dir + "/test"

    # Define your transforms for the training, validation, and testing sets
    # Means: [0.485, 0.456, 0.406]. Standard deviations [0.229, 0.224, 0.225]. Calculated by ImageNet images.
    # Transformation on training set: random rotation, random resized crop to 224 x 224 pixels, random horizontal and vertical flip, tranform to a tensor and normalize data.
    train_transforms = transforms.Compose([transforms.RandomRotation(23),
                                           transforms.RandomResizedCrop(224),
                                           transforms.RandomHorizontalFlip(),
                                           transforms.RandomVerticalFlip(),
                                           transforms.ToTensor(),
                                           transforms.Normalize([0.485, 0.456, 0.406],
                                                                [0.229, 0.224, 0.225])])

    # Transformation on validation set: resize and center crop to 224 x 224 pixels, tranform to a tensor and normalize data.
    valid_transforms = transforms.Compose([transforms.Resize(255),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize([0.485, 0.456, 0.406],
                                                                [0.229, 0.224, 0.225])])

    # Transformation on test set: resize and center crop to 224 x 224 pixels, tranform to a tensor and normalize data.
    test_transforms = transforms.Compose([transforms.Resize(255),
                                          transforms.CenterCrop(224),
                                          transforms.ToTensor(),
                                          transforms.Normalize([0.485, 0.456, 0.406],
                                                               [0.229, 0.224, 0.225])])

    # Load the datasets with ImageFolder
    global train_dataset
    global valid_dataset
    global test_dataset
    train_dataset = datasets.ImageFolder(data_dir + "/train", transform=train_transforms)
    valid_dataset = datasets.ImageFolder(data_dir + "/valid", transform=valid_transforms)
    test_dataset = datasets.ImageFolder(data_dir + "/test", transform=test_transforms)

    # Using the image datasets and the trainforms, define the dataloaders, as global variables.
    global trainloader
    global validloader
    global testloader
    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
    validloader = torch.utils.data.DataLoader(valid_dataset, batch_size=64)
    testloader = torch.utils.data.DataLoader(test_dataset, batch_size=64)

    # Label mapping.
    global cat_to_name
    with open("cat_to_name.json", "r") as f:
        cat_to_name = json.load(f)

    print("Done loading data...")

    return

def train_model(in_args):
    """
    Function to build and train model.
    """
    # Number of epochs.
    global epochs
    epochs = in_args.epochs
    # Set running_loss to 0
    running_loss = 0

    # Prepare lists to print losses and accuracies.
    global list_running_loss
    global list_valid_loss
    global list_valid_accuracy
    list_running_loss, list_valid_loss, list_valid_accuracy = [], [], []

    # If in testing mode, set loop counter to prematurly return to the main().
    if in_args.test == True:
        loop_counter = 0

    # for loop to train model.
    for epoch in range(epochs):
        # for loop to iterate through training dataloader.
        for inputs, labels in trainloader:
            # If in testing mode, increase loop counter to prematurly return to the main() after 5 loops.
            if in_args.test == True:
                loop_counter +=1
                if loop_counter == 5:
                    return

            # Move input and label tensors to the default device.
            inputs, labels = inputs.to(device), labels.to(device)

            # Set gradients to 0 to avoid accumulation
            optimizer.zero_grad()

            # Forward pass, back propagation, gradient descent and updating weights and bias.
            # Forward pass through model to get log of probabilities.
            log_ps = model.forward(inputs)
            # Calculate loss of model output based on model prediction and labels.
            loss = criterion(log_ps, labels)
            # Back propagation of loss through model / gradient descent.
            loss.backward()
            # Update weights / gradient descent.
            optimizer.step()

            # Accumulate loss for training image set for print out in terminal
            running_loss += loss.item()

            # Calculate loss for verification image set and accuracy for print out in terminal.
            # Validation pass and print out the validation accuracy.
            # Set loss of validation set and accuracy to 0.
            valid_loss = 0
            # test_loss = 0
            valid_accuracy = 0
            # test_accuracy = 0

            # Set model to evaluation mode to turn off dropout so all images in the validation & test set are passed through the model.
            model.eval()

            # Turn off gradients for validation, saves memory and computations.
            with torch.no_grad():
                # for loop to evaluate loss of validation image set and its accuracy.
                for valid_inputs, valid_labels in validloader:
                    # Move input and label tensors to the default device.
                    valid_inputs, valid_labels = valid_inputs.to(device), valid_labels.to(device)

                    # Run validation image set through model.
                    valid_log_ps = model.forward(valid_inputs)

                    # Calculate loss for validation image set.
                    valid_batch_loss = criterion(valid_log_ps, valid_labels)

                    # Accumulate loss for validation image set.
                    valid_loss += valid_batch_loss.item()

                    # Calculate probabilities
                    valid_ps = torch.exp(valid_log_ps)

                    # Get the most likely class using the ps.topk method.
                    valid_top_k, valid_top_class = valid_ps.topk(1, dim=1)

                    # Check if the predicted classes match the labels.
                    valid_equals = valid_top_class == valid_labels.view(*valid_top_class.shape)

                    # Calculate the percentage of correct predictions.
                    valid_accuracy += torch.mean(valid_equals.type(torch.FloatTensor)).item()

            # Print out losses and accuracies
            # Create string for running_loss.
            str1 = ["Train loss: {:.3f} ".format(running_loss) if in_args.running_loss == True else ""]
            str1 = "".join(str1)
            # Create string for valid_loss.
            str2 = ["Valid loss: {:.3f} ".format(valid_loss/len(validloader)) if in_args.valid_loss == True else ""]
            str2 = "".join(str2)
            # Create string for valid_accuracy.
            str3 = ["Valid accuracy: {:.3f} ".format(valid_accuracy/len(validloader)) if in_args.valid_accuracy == True else ""]
            str3 = "".join(str3)
            # Print strings
            print(f"{epoch+1}/{epochs} " + str1 + str2 + str3)

            # Append current losses and accuracy to lists to print losses and accuracies.
            list_running_loss.append(running_loss)
            list_valid_loss.append(valid_loss/len(validloader))
            list_valid_accuracy.append(valid_accuracy/len(validloader))

            # Set running_loss to 0.
            running_loss = 0

            # Set model back to train mode.
            model.train()

    print("Done training model...")

    return
mme
  • 21
  • 1
  • 3
  • And what are the contents of the file in question (._image_04589.jpg)? I'm not familiar with the library, but it looks like it's just traversing all the files under the train directory and trying to load them as images, which could well go poorly if there are non-image files there. – manveti Jun 14 '19 at 22:10
  • The model predicts flower types. The model is trained with 102 classes, i.e., trained with flower images of 102 different flower types (in the above example class 40, that's why there is a 40 in the file path in the error message). Each of the 102 folders contains different images of that particular flower type for the model to be trained on. The images are actually there and I can open them. So I don't think the issue is related to the file content. – mme Jun 15 '19 at 05:12
  • FYI I also copied the files higher in the folder tree so the file path gets shorter. But that did not resolve the issue. – mme Jun 15 '19 at 05:14
  • FYI I tried the same code on OS X where it runs without any issue. – mme Jun 15 '19 at 17:58

1 Answers1

0

A colleague at work pointed out that in Linux files beginning with a period are hidden files. So I selected "show hidden files" in the file explorer and there they were. I deleted them, which resolved the issue (see commands below).

Find and display all files beginning with "._" in all subfolder (display the selected files first to make sure these are the files you want to delete):

find test -name '._*' -print

Find and delete all files beginning with "._" in all subfolder

find test -name '._*' -delete
Gilfoyle
  • 3,282
  • 3
  • 47
  • 83
mme
  • 21
  • 1
  • 3