Questions tagged [ray-tune]

72 questions
0
votes
1 answer

Unable to install ray[tune] tune-sklearn

I'm trying to install ray[tune] tune-sklearn on my machine but keeps failing. I'm using a MacBook Pro 2019 with Big Sur Version 11.6 and Python 3.9.7 (default, Sep 16 2021, 08:50:36) [Clang 10.0.0 ] :: Anaconda, Inc. on darwin. All other packages…
user1857403
  • 289
  • 5
  • 12
0
votes
1 answer

python ray tune unable to stop trial or experiment

I am trying to make ray tune with wandb stop the experiment under certain conditions. stop all experiment if any trial raises an Exception (so i can fix the code and resume) stop if my score gets -999 stop if the variable varcannotbezero gets…
user670186
  • 2,588
  • 6
  • 37
  • 55
0
votes
1 answer

Use GPU OR CPU on Ray tune

I have 1 GPU and 32 CPUs available in my machine. Is it possible in Ray to use them separatelly? For instance, one task gets allocated with 1 CPU and another task with 1 GPU? If I use tune.run(trainer_fn, num_samples=32, …
0
votes
1 answer

How to restore a ray-tune checkpoint when it is integrated with Pytorch Lightning?

I have a ray tune analysis object and I am able to get the best checkpoint from it: analysis = tune_robert_asha(num_samples=2) best_ckpt = analysis.best_checkpoint But I am unable to restore my pytorch lightning model with it. I…
Luca Guarro
  • 1,085
  • 1
  • 11
  • 25
0
votes
1 answer

Ray Tune error when using Trainable class with tune.with_parameters

Using very simple example from tune documentation itself: from ray import tune import numpy as np class MyTrainable(tune.Trainable): def setup(self, config, dataset=None): print(config, dataset) self.dataset = dataset …
Snufkin
  • 25
  • 1
  • 8
0
votes
1 answer

Is there an `initial_workers` (cluster.yaml) replacement mechanism in ray tune?

I shortly describe my use case: Assuming I wanted to spin up a cluster with 10 workers on AWS: In the past I always used initial_workers: 10, min_workers: 0, max_workers: 10 options (cluster.yaml) to initially spin up the cluster to full capacity…
Denis
  • 13
  • 2
0
votes
1 answer

TuneError: ('Trials did not complete')

I wrote a program using keras that detects real texts from fake (I used 5000 training data and 10,000 test data), I used Transformer and 'distilbert-base-uncased' model for detection. Now I decide to hyperparameters tuning using the grid search ,…
0
votes
0 answers

Ray.Tune's PB2 fails consistently on the same actor at the same training point because Tune code returns a ValueError

I have started several trials using ray.tune's PB2. They use 8 actors and perturb every 20 steps. Actors 0-6 don't have any trouble, but then actor 7, in the second 20-step epoch, consistently catches an error. In the terminal, I get the following…
LaMaster90
  • 11
  • 1
0
votes
1 answer

type 'NoneType' is not iterable error when training pytorch model with ray tunes Trainable API

I wrote a simple pytorch script to train MNIST and it worked fine. I reimplemented my script to be with Trainable class: import numpy as np import torch import torch.optim as optim import torch.nn as nn from torchvision import datasets,…
Alex Goft
  • 1,114
  • 1
  • 11
  • 23
0
votes
1 answer

How do I make ray.tune.run reproducible?

I'm using Tune class-based Trainable API. See code sample: from ray import tune import numpy as np np.random.seed(42) # first run tune.run(tune.Trainable, ...) # second run, expecting same result np.random.seed(42) tune.run(tune.Trainable,…
ptyshevs
  • 1,602
  • 11
  • 26
0
votes
1 answer

Using Ray-Tune with sklearn's RandomForestClassifier

Putting together different base and documentation examples, I have managed to come up with this: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) def objective(config, reporter): for i in range(config['iterations']): …
LeggoMaEggo
  • 512
  • 1
  • 9
  • 24
0
votes
1 answer

Can you use different stopping conditions for schedulers versus general tune trials

In Ray Tune, is there any guidance on whether using different stopping conditions for a scheduler versus a trial is fine to do? Below, I have an async hyperband scheduler stopping based on neg_mean_loss, and tune itself stopping based on…
rasen58
  • 4,672
  • 8
  • 39
  • 74
1 2 3 4
5