I am trying to decipher why this is just hanging with modin and works fine with regular pandas:
import modin.pandas as pd
infile1 = 'D:\\test_files\\curves_crosstab.csv'
infile2 = 'D:\\test_files\\8760_crosstab.csv'
infilenames = [infile1, infile2]
outfile1 = 'D:\\test_files\\curves_sample_output.csv'
outfile2 = 'D:\\test_files\\8760_sample_output.csv'
for i in range(len(infilenames)) :
if 'curves' in infilenames[i] :
print("in curves")
df = pd.read_csv(infilenames[i], header=[0,1,2,3])
print("read curves")
df.columns = df.columns.to_flat_index()
print("indexed columns")
df.columns = ['_'.join(i) for i in df.columns]
print("joined columns")
df2 = df.melt(id_vars=['Unnamed: 0_level_0_Unnamed: 0_level_1_Unnamed: 0_level_2_Year',
'Unnamed: 1_level_0_Unnamed: 1_level_1_Unnamed: 1_level_2_Month',
'Unnamed: 2_level_0_Unnamed: 2_level_1_Unnamed: 2_level_2_Day',
'Unnamed: 3_level_0_Unnamed: 3_level_1_Unnamed: 3_level_2_Hour'])
print("melted")
df2 = pd.concat([df2,df2.variable.str.split('_',expand=True)],axis=1)
del df2['variable']
print("deleted variable column")
df2.rename(columns={'Unnamed: 0_level_0_Unnamed: 0_level_1_Unnamed: 0_level_2_Year' : 'Year' ,
'Unnamed: 1_level_0_Unnamed: 1_level_1_Unnamed: 1_level_2_Month' : 'Month',
'Unnamed: 2_level_0_Unnamed: 2_level_1_Unnamed: 2_level_2_Day' : 'Day',
'Unnamed: 3_level_0_Unnamed: 3_level_1_Unnamed: 3_level_2_Hour' : 'Hour',
0 : 'currency',
1 : 'consultant_or_case',
2 : 'name',
3 : 'hub',
'value' : 'rate_in_local_currency'}, inplace = True)
print("renamed")
pd.DataFrame.to_csv(df2, path_or_buf=outfile1,index=False,encoding='utf-8')
print("created csv")
else :
df = pd.read_csv(infilenames[i], encoding='cp1252')
df2 = df.melt(id_vars=['Month','Day','Hour'])
pd.DataFrame.to_csv(df2, path_or_buf=outfile2,index=False,encoding='utf-8')
when I ran this under pandas it executed, but took an average of 87 seconds due to the size of the curves files (~36.5MB in and ~395MB out), I was hoping modin could cut that time. The script when swapped to Modin runs, but it just sits and doesn't do anything. It doesn't even give me
Waiting for redis server at 127.0.0.1:14618 to respond... Waiting for redis server at 127.0.0.1:31410 to respond... Starting local scheduler with the following resources: {'CPU': 4, 'GPU': 0}.
I don't know if that should show up in the console or not, but it doesn't. the script gets to the first reading of the csv, as I get in curves. Then it just sits. Never doing anything else. How can I figure out what is going on?
OS is Windows10 if that matters.