My code looks like that:
# Features' construction - Multiprocessing #
import pandas as pd
import time
import ray
import multiprocessing
ray.shutdown()
num_cpus = multiprocessing.cpu_count()
print(num_cpus)
print()
ray.init(num_cpus=num_cpus)
start = time.time()
@ray.remote
def features_construct(index, row):
dict_features ={}
...
return(dict_features)
data = []
for index, row in enumerate(raw_data):
data.append(features_construct.remote(index, row))
df_data = pd.DataFrame.from_records(ray.get(data))
end = time.time()
print(round(end - start, 2), 'seconds')
However, I receive the following error:
2019-10-21 11:30:06,751 WARNING worker.py:416 -- Local object store memory usage:
num clients with quota: 0
quota map size: 0
pinned quota map size: 0
allocated bytes: 5933914928
allocation limit: 7055781888
pinned bytes: 0
(global lru) capacity: 7055781888
(global lru) used: 84.1%
(global lru) num objects: 375838
(global lru) num evictions: 504699
(global lru) bytes evicted: 8467163584
---------------------------------------------------------------------------
UnreconstructableError Traceback (most recent call last)
<ipython-input-27-0f27da969dd5> in <module>
155 print()
156
--> 157 df_data = pd.DataFrame.from_records(ray.get(data))
158
159
/storage2/user/anaconda/lib/python3.7/site-packages/ray/worker.py in get(object_ids)
2347 if isinstance(value, ray.exceptions.UnreconstructableError):
2348 worker.dump_object_store_memory_usage()
-> 2349 raise value
2350
2351 # Run post processors.
UnreconstructableError: Object 54158c91583effffffff0100000000c001000000 is lost (either LRU evicted or deleted by user) and cannot be reconstructed. Try increasing the object store memory available with ray.init(object_store_memory=<bytes>) or setting object store limits with ray.remote(object_store_memory=<bytes>). See also: https://ray.readthedocs.io/en/latest/memory-management.html
How do I fix that?
If this is fixed as the error says by setting a value at object_store_memory
then what should this value be?
To start with, the documentation of 'ray' does not clarify what is the default value (or at least how it is calculated) for this parameter.
Just to be clear, I run my script on a remote server.
This:
num_cpus = multiprocessing.cpu_count()
print(num_cpus)
gives 32
as an output.