3

I am running on a new remote server a code that used to work on another remote server. I think I setup things in the same way, but when I run my training script, I get this error:

Traceback (most recent call last):
  File "/home/andrea/code/vertikal-machine-learning/source/model/hss_bearing_mk2/hss_bearing_mk2/models/train_model.py", line 144, in <module>
    seq_len=seq_len, mname=mname)
  File "/home/andrea/code/vertikal-machine-learning/source/model/hss_bearing_mk2/hss_bearing_mk2/models/pytorch_models.py", line 321, in train_test
    trainer.fit(model, datamodule=dm)
  File "/home/andrea/anaconda3/envs/hss_bearing_mk2/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 552, in fit
    self._run(model)
  File "/home/andrea/anaconda3/envs/hss_bearing_mk2/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 849, in _run
    self.config_validator.verify_loop_configurations(model)
  File "/home/andrea/anaconda3/envs/hss_bearing_mk2/lib/python3.7/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 34, in verify_loop_configurations
    self.__verify_train_loop_configuration(model)
  File "/home/andrea/anaconda3/envs/hss_bearing_mk2/lib/python3.7/site-packages/pytorch_lightning/trainer/configuration_validator.py", line 49, in __verify_train_loop_configuration
    has_training_step = is_overridden("training_step", model)
  File "/home/andrea/anaconda3/envs/hss_bearing_mk2/lib/python3.7/site-packages/pytorch_lightning/utilities/model_helpers.py", line 45, in is_overridden
    raise ValueError("Expected a parent")
ValueError: Expected a parent

Here is the part of code that looks buggy for some reason:

    model = get_model(mname=mname)

    dm = DataModule(
        X_train=X_train,
        y_train=y_train,
        X_val=X_val,
        y_val=y_val,
        X_test=X_test,
        y_test=y_test,
        keys_train=keys_train,
        keys_val=keys_val,
        keys_test=keys_test,
        seq_len=seq_len,
        batch_size=batch_size,
        num_workers=4
    )
    # trainer.logger_connector.callback_metrics
    trainer.fit(model, datamodule=dm)

Is it something related to environment setup? Something overridden by something something??

Can someone point me in the right direction?

EDIT: I tried to run my project locally in a newly created environment and I have the same error.

EDIT 2: My DataModule inherits from LightningDataModule

class DataModule(pl.LightningDataModule):
shamalaia
  • 2,282
  • 3
  • 23
  • 35

6 Answers6

4

With lightning versions 2.0.0, use import lightning.pytorch as pl instead of import pytorch_lightning as pl.

arthur.sw
  • 11,052
  • 9
  • 47
  • 104
3

In my case the problem was that in callbacks list passed to Trainer there was one element which wasn't a Callback. When I removed it all worked well.

thawro
  • 31
  • 2
1

The problem was that model was inheriting from nn.Module instead of from pl.LightningModule

shamalaia
  • 2,282
  • 3
  • 23
  • 35
1

If you encounter the same error with the Ray Tune callback TuneReportCheckpointCallback while using import lightning as L on lightning versions 2.0.0 and above, there is a related GitHub issue which is being addressed. They plan to fix the issue soon.

1

There is a callback issue as mentioned by others. Also the github link mentioned by Ching Chang is a good clue. One can solve this issue by including the parent pl.Callback (here for an EarlyStopping callback) in a new class:

import pytorch_lightning as pl
from lightning.pytorch.callbacks.early_stopping import EarlyStopping

class _EarlyStopping(EarlyStopping, pl.Callback):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

early_stop_callback = _EarlyStopping(monitor="val_loss", min_delta=0.00, patience=10, verbose=False, mode="max")

then use the callback:

trainer = pl.Trainer(callbacks=[early_stop_callback], accelerator='gpu', max_epochs=epochs, logger=logger, devices=1)
Mihai.Mehe
  • 448
  • 8
  • 13
1

https://github.com/Lightning-AI/lightning/issues/17485#issuecomment-1524198677

It happens when lightning.pytorch and pytorch_lightning imports are mixed together.

EyesBear
  • 1,376
  • 11
  • 21