0

I am trying to tune a neural network using ray. I follow the standard flow to get it running on MNIST data. Data loading

  trainset = torchvision.datasets.MNIST(
        root='../data', train=True, download=True, transform=transforms.Compose([
                         transforms.ToTensor(),
                         transforms.Normalize((0.1307,), (0.3081,))
                     ]))

  testset = torchvision.datasets.MNIST(
        root='../data', train=False, download=True, transform=transforms.Compose([
                         transforms.ToTensor(),
                         transforms.Normalize((0.1307,), (0.3081,))
                     ]))

  train_loader = torch.utils.data.DataLoader(
      trainset,
      batch_size=config_set["batch_size"], shuffle=True)
  
  test_loader = torch.utils.data.DataLoader(
      testset,
      batch_size=1000, shuffle=True)

when we run the tune with the configurable hyper parameters, it throws error

 config_set = {
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([16, 32, 64,128])
}

result = tune.run(
    train_model, fail_fast="raise", config=config_set)

*** ValueError: batch_size should be a positive integer value, but got batch_size=<ray.tune.search.sample.Categorical object at ***

S.Dasgupta
  • 61
  • 9

1 Answers1

2

For custom training code, Tune allows you to wrap it in a Function Trainable, which gets passed into Tune and provides you with a resolved config dict. Currently, you're passing in the unresolved search space object (the categorical object resulting from tune.choice).

from ray import air, tune
from ray.air import session

# Wrap it in a function
def trainable(config: dict):
    # Your training code...
    trainset = torchvision.datasets.MNIST(
        root='../data', train=True, download=True, transform=transforms.Compose([
                         transforms.ToTensor(),
                         transforms.Normalize((0.1307,), (0.3081,))
                     ]))
    testset = torchvision.datasets.MNIST(
        root='../data', train=False, download=True, transform=transforms.Compose([
                         transforms.ToTensor(),
                         transforms.Normalize((0.1307,), (0.3081,))
                     ]))

    train_loader = torch.utils.data.DataLoader(
      trainset,
      batch_size=config["batch_size"], shuffle=True)

    train_model(...)

config_set = {
    "lr": tune.loguniform(1e-4, 1e-1),
    "batch_size": tune.choice([16, 32, 64,128])
}

tuner = tune.Tuner(
    trainable,
    param_space=config_set,
    run_config=air.RunConfig(
        failure_config=air.FailureConfig(fail_fast="raise")
    ),
)
results = tuner.fit()
Justin Yu
  • 101
  • 2
  • Thanks! It just ran perfect. Just have one query on the result part. results.get_dataframe() returns one row with the details of the last trial. How to get the same details for all the trials ray performs. – S.Dasgupta Feb 09 '23 at 11:27
  • You need to specify more than 1 sample for Tune to run multiple trials. To do this, specify it through the `TuneConfig`: `tuner = tune.Tuner(..., tune_config=tune.TuneConfig(num_samples=3))`. After setting this, you will see more than 1 result in `results.get_dataframe()`. – Justin Yu Feb 12 '23 at 22:57