I'm taking the "Deep NNs with PyTorch" course by IBM and I encountered lab examples where SDG is used for optimizer while batch size is >1 in DataLoader.
If I understand correctly, SGD would perform gradient descent with only 1 training example in each step, so it this case how would the SGD interact with each batch of training example?
For example, if batch size = 20, would the SGD optimizer perform 20 GD steps in each batch? If this is the case, then does that mean no matter what batch size I set for DataLoader, the SGD optimizer would just perform (# of training example) GD steps in one epoch?
Layers = [2, 50, 3]
model = Net(Layers)
learning_rate = 0.10
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
train_loader = DataLoader(dataset=data_set, batch_size=20)
criterion = nn.CrossEntropyLoss()
LOSS = train(data_set, model, criterion, train_loader, optimizer, epochs=100)
def train(data_set, model, criterion, train_loader, optimizer, epochs=100):
LOSS = []
ACC = []
for epoch in range(epochs):
for x, y in train_loader:
print(x, y)
optimizer.zero_grad()
yhat = model(x)
loss = criterion(yhat, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
LOSS.append(loss.item())
ACC.append(accuracy(model, data_set))
...