0

My source code is using pytorch and like this:

def Embed(sequenceSet):
  output = []
  for s in sequenceSet:
    PseDNCSequence = Embedding.PseDNC(str(s))
    ANFSequence = Embedding.ANF(str(s))
    EIIPSequence = Embedding.EIIP(str(s))
    emdbededSequence = PseDNCSequence+ANFSequence+EIIPSequence
    output.append(emdbededSequence)
  return np.array(output)

text = file.read()
lines = text.strip().split('\n')
embededS = Embed(lines)
embeddedSequences = torch.tensor(embededS)
my_dataset = TensorDataset(embeddedSequences)
loader = data.DataLoader(my_dataset, batch_size=batch_size, shuffle=True)

for epoch in range(num_epochs):
   for training_sample in loader:
      training_sample = training_sample.view(-1, sequenceLength)
      batch_size = training_sample.shape[0]
      ....

It is getting error " AttributeError: 'list' object has no attribute 'view' " at the line:

training_sample = training_sample.view(-1, sequenceLength)

I tried to change it into:

training_sample = torch.tensor(training_sample)
training_sample = training_sample.view(-1, sequenceLength)

...then I received another error "ValueError: only one element tensors can be converted to Python scalars"

I already checked the other places and make sure that the input data to the loader is tensor. Can anybody help me how to solve it

Thank you so much !!!!

Peter Phan
  • 25
  • 5
  • Why do you use `return np.array(output)` instead of `return torch.tensor(output)`? It may be the source of the issue. – Valentin Goldité Jul 26 '23 at 15:37
  • @ValentinGoldité: I tried both of them, but they are both failed with the same error. Actually, I doubt the error cause by converting python list to tensor, that's why I tried to convert them into np.array first. But as I said, they are the same error – Peter Phan Jul 26 '23 at 16:02
  • I found that my dataset is 1D, which cannot work with Pytorch Dataloader, however, I have not found any appropriate solution for that. – Peter Phan Jul 27 '23 at 03:27

1 Answers1

0

After a day search every where in the internet and inject "print" to a lot of places, I found that the root causes is the size of my dataset. Specifically, the Pytorch Dataloader will encounter this error when the entered dataset is 1D. So, I solved it by adding one more dimension with default value. The source code like this:

y = torch.ones(len(embededSequence), 1).float()
dataset = TensorDataset(embededSequence, y)

in which, I add a new column y with value is 1 to the dataset. Then, everything work normally

Peter Phan
  • 25
  • 5
  • What about just: `out_tsr = torch.tensor(output); out_tsr = torch.unsqueeze(out_tsr, -1)` at the end of your dataset? – Valentin Goldité Jul 27 '23 at 09:23
  • I did not try unsqueeze, but the the previous one torch.tensor(output) did not work. By they way, when I search for 'Pytorch Dataloader for 1D datasets', they really show the different story. That's why I tried to add one column y, and surprisingly, it worked well. – Peter Phan Jul 27 '23 at 11:12
  • Unsqueeze is the operation to add column. I wrote `torch.tensor(...)` to remind you to call it on torch tensor – Valentin Goldité Jul 27 '23 at 14:47
  • @ValentinGoldité, that's excellent good, thank you so much – Peter Phan Jul 28 '23 at 08:09