2

I am new to pytorch . I have big dataset consist of two txt files one for data and other for target data . In training file each line is list of length 340, In target each line is list of 136.

I would like to ask how i can define my dataset so I can use Dataloader to load my data to train pytorch model?

I apricate you answers

No Na
  • 27
  • 2
  • 7

1 Answers1

1

Dataset from torch.utils.data is an abstract class representing a dataset. Your custom dataset should inherit Dataset and override the following methods:

__len__() so that len(dataset) returns the size of the dataset.
__getitem__() to support the indexing such that dataset[i] can be used to get ith sample

Eg of writing custom Dataset
i have written a general custom dataloader for you as your problem statement.
here data.txt has data and label.txt has labels.

import torch
from torch.utils.data import Dataset

class CustomDataset(Dataset):
    def __init__(self):
        
       
        with open('data.txt', 'r') as f:
                self.data_info = f.readlines()
        
        with open('label.txt', 'r') as f:
                self.label_info = f.readlines()        


    def __getitem__(self, index):
        
        single_data = self.data_info[index].rstrip('\n')
        

        single_label = self.label_info[index].rstrip('\n')

        return ( single_data , single_label)

    def __len__(self):
        return len(self.data_info)
# Testing 
d = CustomDataset()
print(d[1]) # should output data along with label

This will be a basic for your case but have to do some changes that matches your case.

Note : you have to make required changes as per your dataset

Prajot Kuvalekar
  • 5,128
  • 3
  • 21
  • 32