PyTorch Lightning training console output is weird

Question

When training a PyTorch Lightning model in a Jupyter Notebook, the console log output is awkward:

Epoch 0: 100%|█████████▉| 2315/2318 [02:05<00:00, 18.41it/s, loss=1.69, v_num=26, acc=0.562]
Validating: 0it [00:00, ?it/s]
Validating:   0%|          | 0/1 [00:00<?, ?it/s]
Epoch 0: 100%|██████████| 2318/2318 [02:09<00:00, 17.84it/s, loss=1.72, v_num=26, acc=0.500, val_loss=1.570, val_acc=0.564]
Epoch 1: 100%|█████████▉| 2315/2318 [02:04<00:00, 18.63it/s, loss=1.56, v_num=26, acc=0.594, val_loss=1.570, val_acc=0.564]
Validating: 0it [00:00, ?it/s]
Validating:   0%|          | 0/1 [00:00<?, ?it/s]
Epoch 1: 100%|██████████| 2318/2318 [02:08<00:00, 18.07it/s, loss=1.59, v_num=26, acc=0.528, val_loss=1.490, val_acc=0.583]
Epoch 2: 100%|█████████▉| 2315/2318 [02:01<00:00, 19.02it/s, loss=1.53, v_num=26, acc=0.617, val_loss=1.490, val_acc=0.583]
Validating: 0it [00:00, ?it/s]
Validating:   0%|          | 0/1 [00:00<?, ?it/s]
Epoch 2: 100%|██████████| 2318/2318 [02:05<00:00, 18.42it/s, loss=1.57, v_num=26, acc=0.500, val_loss=1.460, val_acc=0.589]

Expectingly, the "correct" output from the same training should be:

Epoch 0: 100%|██████████| 2318/2318 [02:09<00:00, 17.84it/s, loss=1.72, v_num=26, acc=0.500, val_loss=1.570, val_acc=0.564]
Epoch 1: 100%|██████████| 2318/2318 [02:08<00:00, 18.07it/s, loss=1.59, v_num=26, acc=0.528, val_loss=1.490, val_acc=0.583]
Epoch 2: 100%|██████████| 2318/2318 [02:05<00:00, 18.42it/s, loss=1.57, v_num=26, acc=0.500, val_loss=1.460, val_acc=0.589]

How comes epoch lines are uselessly repeated and split in this manner? Also I'm not sure what use the Validating lines are, since they don't seem to provide any information.

Training and validation steps from the model are as follow:

    def training_step(self, train_batch, batch_idx):
        x, y = train_batch
        y_hat = self.forward(x)
        loss = torch.nn.NLLLoss()(torch.log(y_hat), y.argmax(dim=1)) 
        acc = tm.functional.accuracy(y_hat.argmax(dim=1), y.argmax(dim=1))
        self.log("acc", acc, prog_bar=True)
        return loss

    def validation_step(self, valid_batch, batch_idx):
        x, y = valid_batch
        y_hat = self.forward(x)
        loss = torch.nn.NLLLoss()(torch.log(y_hat), y.argmax(dim=1)) 
        acc = tm.functional.accuracy(y_hat.argmax(dim=1), y.argmax(dim=1))
        self.log("val_loss", loss, prog_bar=True)
        self.log("val_acc", acc, prog_bar=True)

Did you manage to find any solution to this? Im facing similar issue as well — Hardian Lawi, Feb 21 '22 at 05:37
@HardianLawi yes, by writing my own console output routine eventually — Jivan, Feb 21 '22 at 09:58

score 0 · Answer 1 · answered May 17 '22 at 03:00

I've had this problem before when terminal windows are resized. The default PL progress bar uses tqdm and you can have issues if tqdm doesn't redraw the screen correctly.

The PL docs mention another, "rich" progress bar you might try instead, and also discuss how to write your own.

score -1 · Answer 2 · answered Jan 02 '22 at 19:42

-1

By default, Trainer is configured to run the validation loop after each epoch. You can change this setting using check_val_every_n_epoch flag in Trainer. See docs here.

answered Jan 02 '22 at 19:42

Harsh

77
2
8

2

Running validation every epoch is fine, but the output just seems messed up. – Jivan Jan 03 '22 at 11:04

PyTorch Lightning training console output is weird

2 Answers2