how to change the labels in a datafolder of pytorch?

Question

I first load an unlabeled dataset as following: unlabeled_set = DatasetFolder("food-11/training/unlabeled", loader=lambda x: Image.open(x), extensions="jpg", transform=train_tfm)

and now since I'm trying to conduct semi-supervised learning: I'm trying to define the following function. The input "dataset" is the unlabeled_set I just loaded.

As I want to change the label of the dataset to be the one I predicted, not the original labels(all of the original labels were 1's), how can I do?

I have tried using dataset.targets to change the labels, but it doesn't work at all. the following is my function:

import torch
def get_pseudo_labels(dataset, model, threshold=0.07):
    # This functions generates pseudo-labels of a dataset using given model.
    # It returns an instance of DatasetFolder containing images whose prediction confidences exceed a given threshold.
    # You are NOT allowed to use any models trained on external data for pseudo-labeling.
    device = "cuda" if torch.cuda.is_available() else "cpu"
    x = []
    y = []
  
    # print(dataset.targets[0])

    # Construct a data loader.
    data_loader = DataLoader(dataset, batch_size=batch_size, shuffle=False)

    # Make sure the model is in eval mode.
    model.eval()
    # Define softmax function.
    softmax = nn.Softmax()
    counter = 0
    # Iterate over the dataset by batches.
    for batch in tqdm(data_loader):
        img, _ = batch

        # Forward the data
        # Using torch.no_grad() accelerates the forward process.
        with torch.no_grad():
            logits = model(img.to(device))

        # Obtain the probability distributions by applying softmax on logits.
        probs = softmax(logits)
        count = 0
        # ---------- TODO ----------
        # Filter the data and construct a new dataset.
        dataset.targets = torch.tensor(dataset.targets)
        for p in probs:
          if torch.max(p) >= threshold:
            if not(counter in x):
              x.append(counter)
            dataset.targets[counter] = torch.argmax(p)
            
          counter += 1

    
    # Turn off the eval mode.
    model.train()
    # dat = DataLoader(ImgDataset(x,y), batch_size=batch_size, shuffle=False)
    print(dataset.targets[10])
    new = torch.utils.data.Subset(dataset, x)
    
    return new```

iacob · Answer 1 · 2021-04-07T10:42:05.687

2

PyTorch DataSets can return tuples of values, but they have no inherent "features"/"target" distinction. You can create your modified DataSet like so:

labeled_data = [*zip(dataset, labels)]
data_loader = DataLoader(labeled_dataset, batch_size=batch_size, shuffle=False)

for imgs, labels in data_loader: # per batch
    ...

edited Apr 07 '21 at 10:42

answered Apr 06 '21 at 15:38

iacob

20,084
6
92
119

Thank you for the answer! I've fixed the problem with the code from:https://discuss.pytorch.org/t/attributeerror-subset-object-has-no-attribute-targets/66564/5. Post here for others! – 李彥儒 Apr 06 '21 at 15:39
@李彥儒 if this helped you solve your problem, would you mind accepting it? – iacob May 09 '21 at 20:51
if dataset and labels are either Pandas series or numpy arrays, does this still work or do I have to something like `torch.from_numpy()` on them first? – mLstudent33 May 12 '23 at 11:50

how to change the labels in a datafolder of pytorch?

1 Answers1