I am working on a google cloud computing instance with 24 vCPUs. The code running is the following
import dask.dataframe as dd
from distributed import Client
client = Client()
#read data
logd = (dd.read_csv('vol/800000test', sep='\t', parse_dates=['Date'])
.set_index('idHttp')
.rename(columns={'User Agent Type':'UA'})
.categorize())
When I run it (and this is also the case for the posterior data analysis I am doing after loading the data) I see 11 cores being used, sometimes 4.
Is there any way to control this better and make full use of cores?