Questions tagged [ray]

Ray is a library for writing parallel and distributed Python applications. It scales from your laptop to a large cluster, has a simple yet flexible API, and provides high performance out of the box.

At its core, Ray is a library for writing parallel and distributed Python applications. Its API provides a simple way to take arbitrary Python functions and classes and execute them in the distributed setting.

Learn more about Ray:

Ray also includes a number of powerful libraries:

  • Cluster Autoscaling: Automatically configure, launch, and manage clusters and experiments on AWS or GCP.
  • Hyperparameter Tuning: Automatically run experiments, tune hyperparameters, and visualize results with Ray Tune.
  • Reinforcement Learning: RLlib is a state-of-the-art platform for reinforcement learning research as well as reinforcement learning in practice.
  • Distributed Pandas: Modin provides a faster dataframe library with the same API as Pandas.
702 questions
0
votes
1 answer

Is it possible to certainly assign one worker to one host in the cluster?

I want each worker to be allocated to different hosts in the cluster. Such as, I have a cluster with 3 hosts, with IP address 192.168.0.100, 192.168.0.101, 192.168.0.102 respectively. I want to create 3 workers, and assign the task of each work to a…
0
votes
1 answer

Can you use different stopping conditions for schedulers versus general tune trials

In Ray Tune, is there any guidance on whether using different stopping conditions for a scheduler versus a trial is fine to do? Below, I have an async hyperband scheduler stopping based on neg_mean_loss, and tune itself stopping based on…
rasen58
  • 4,672
  • 8
  • 39
  • 74
0
votes
0 answers

How to direct trial output to files

I'm running an Experiment with multiple trials and I would like to log the STDOUT for the process for each trial in a file in the directory created in ray_results for that trial. I have set log_to_driver to false since the driver couldn't keep up…
Lubed Up Slug
  • 168
  • 1
  • 11
0
votes
1 answer

How can I get data from the buffer efficiently onto the GPU in PyTorch?

I have a ray actor which collects experiences (the buffer) and a ray actor that optimizes over them (the learner) + several actors that only collect experiences. This is similar to the Ape-X reinforcement learning algorithm. My main problem is that…
Muppet
  • 5,767
  • 6
  • 29
  • 39
0
votes
0 answers

Multiprocessing for partitions of data

I have a code as following: import pandas as pd import os import jellyfish import numpy as np a = {'c' : ['dog', 'cat', 'tree','slow','fast','hurry','hello', 'world', 'germany', 'france','rahul', 'india', 'pakisthan', 'bangla',…
Vas
  • 918
  • 1
  • 6
  • 19
0
votes
0 answers

How do I transfer data on workers in Ray on GCP?

I'd like to cover three scenarios: I want to copy various configuration files to the head node I want to copy various data (e.g. images) to the workers, so that they can each work independently on it, without having to pass the data through the…
Muppet
  • 5,767
  • 6
  • 29
  • 39
0
votes
1 answer

Using ray with scipy functions

Ray allows for parallel processing and I am trying to use it with scipy module. I just setup ray and I am not sure if the behavior is expected. But anyway here is the script and the output. import math import cmath as cm import numpy…
ankit7540
  • 315
  • 5
  • 20
0
votes
0 answers

Unable to use multiple GPUs with Ray

I am using the Asynchronous Hyperband scheduler https://ray.readthedocs.io/en/latest/tune-schedulers.html?highlight=hyperband with 2 GPUs. My machine configuration has 2 GPUs and 12 GPUs. But still, only one trail runs at a time whereas 2 trials…
0
votes
1 answer

Workers not being used due to an autoscaler error

This problem relates to the previous question Worker node-status on a Ray EC2 cluster: update-failed; when using Ray for an EC2 cluster. It seems that the cluster is only using the head node despite the config specifying 2 worker nodes. Below is the…
Nick Mint
  • 145
  • 12
0
votes
1 answer

How to log images in middle of `train`

It seems like you can only log data by return values from train. In many workflows, it might make more sense to directly save images in the middle of a train function (e.g. save images sampled by a generative model or from a vision-based MDP). Is…
Veech
  • 1,397
  • 2
  • 12
  • 20
0
votes
1 answer

Python ray code working slower as compared to python multi-processing

I am wanting to issue http requests in parallel and here is how my code (skeleton) looks like when using ray: @ray.remote def issue_request(user_id): r = requests.post(url , json, headers) ray.get([issue_request.remote(id_token[user_id]) for _…
user1274878
  • 1,275
  • 4
  • 25
  • 56
0
votes
1 answer

Worker node-status on a Ray EC2 cluster: update-failed

I now have a Ray cluster working on EC2 (Ubuntu 16.04) with a c4.8xlarge master node and one identical worker. I wanted to check whether multi-threading was being used, so I ran tests to time increasing numbers (n) of the same 9-second task. Since…
0
votes
1 answer

After using @ray decorator can't add data to dictionary

I've wrote a function which get some data from API and return to dictionary. When i executed it works ok. The problem is when I am trying to execute this function twice in "the same time" using ray library. Data are getting from api correctly but…
0
votes
1 answer

What is the minimal cluster config file in Ray for a laptop/dev machine?

A lot of ray commands require a CLUSTER_CONFIG file. for example Usage: ray get-head-ip [OPTIONS] CLUSTER_CONFIG_FILE Options: -n, --cluster-name TEXT Override the configured cluster name. --help Show this message and…
Duane
  • 4,572
  • 6
  • 32
  • 33
0
votes
1 answer

How to list workers logged onto head

I'm setting up Ray on a kubernetes cluster. I have started some workers and a head inside some pods. Is there a way I can list the workers attached to the head, without writing a cluster config file?
Duane
  • 4,572
  • 6
  • 32
  • 33