0

I have a model that should train with 25000 data in 50000 epochs. I want to train with the percentage of datasets for a percentage of periods for example it trains for 10 first epochs only 1000 random data than for 10 next epochs, 1000 random data..... My source code in part of datalogger is as follows.

class DataModule(pl.LightningDataModule):

  def __init__(self, train_dataset, val_dataset,  batch_size = 2):

    super(DataModule, self).__init__()
    self.train_dataset = train_dataset
    self.val_dataset = val_dataset
    self.batch_size = batch_size

  def train_dataloader(self):
    return DataLoader(self.train_dataset, batch_size = self.batch_size, 
                      collate_fn = collate_fn, shuffle = True, num_workers = 2, pin_memory = True)
  
  def val_dataloader(self):
    return DataLoader(self.val_dataset, batch_size = self.batch_size,
                    collate_fn = collate_fn, shuffle = False, num_workers = 2, pin_memory = True)

I understand below code could select random of dataset but I want to train the other data for next epochs too.

df_fraction= df_mydataset.sample(frac=0.04) 

And I understand below code could select a random dataset but I don't know how it works. Because I should change data for each 10 epochs

train_sampler = SubsetRandomSampler(train_indices)
train_loader = torch.utils.data.DataLoader(dataset, batch_size=2, sampler=train_sampler)

How can I do that with batch_size=2?

avocadoLambda
  • 1,332
  • 7
  • 16
  • 33
diamond
  • 11
  • 2

0 Answers0