0

I am trying to read a csv from a remote server using SFTP in Dask. Below is my code snippet

import time
from dask.distributed import Client, Future
import dask.dataframe as dd
import pandas as pd
import asyncio
import dask
password = 'Password_123'
dask_master = '192.0.0.1:8786'
# import distributed
# distributed.config['distributed']['scheduler']['allowed-failures'] = 50
dask.config.set({'distributed.scheduler.allowed-failures': 999}) 
print(dask.config.get("distributed.scheduler.allowed-failures"))
async def call_data_load():
    data_path = r'sftp://root:'+password+'@192.0.0.2:22/root/sample_data/sample.csv'
    df =  await data_load(data_path)
    print(df.head())
    return df

async def data_load(data_path):
    async with Client(dask_master, asynchronous=True) as client:
        df = dd.read_csv(data_path,blocksize=2)
        df = await client.compute(df,npartitions=10,chunksize="8 MiB")
        print(type(df))
        return df 
    

try:
    y = asyncio.run(call_data_load())
except Exception as ex:
    raise

I am getting distributed.scheduler.KilledWorker

I tried to change the allowed failure count to 999 but still, the error persists. I am able to run it without submitting the job to the Dask cluster. the error occurs only when I am trying to submit the job to the Dask cluster.

0 Answers0