0

I unfortunately can't share a working example of the problem as I don't know what is causing it. However, I have put together dummy code showing the structure of my DataFrame as well as the downsampling I am trying to do:

Example code:

department=[]
team=[]
role=[]

#Department 1 components
department1_A_ROLE1= pd.Series(abs(np.random.randn(5)), index=pd.date_range('01-26-2018',periods=5,freq='B'))
department.append('Department 1')
team.append('A')
role.append('ROLE1')
department1_A_ROLE2= pd.Series(abs(np.random.randn(4)), index=pd.date_range('01-26-2018',periods=4,freq='B'))
department.append('Department 1')
team.append('A')
role.append('ROLE2')


#Department 2 components
department2_B_ROLE1= pd.Series(abs(np.random.randn(7)), index=pd.date_range('01-28-2018',periods=7,freq='B'))
department.append('Department 2')
team.append('B')
role.append('ROLE1')
department2_C_ROLE1= pd.Series(abs(np.random.randn(2)),  index=pd.date_range('02-02-2018',periods=2,freq='B'))
department.append('Department 2')
team.append('C')
role.append('ROLE1')


#Department 3 component
department3_B_ROLE2 = pd.Series(abs(np.random.randn(4)), index=pd.date_range('01-31-2018',periods=4,freq='B'))
department.append('Department 3')
team.append('B')
role.append('ROLE2')



#----Generate multi index columns
arrays=[department, team, role]
tuples = list(zip(*arrays))

df=pd.concat([department1_A_ROLE1, department1_A_ROLE2, department2_B_ROLE1, department2_C_ROLE1, department3_B_ROLE2], axis=1)
dateseries=df.index

index = pd.MultiIndex.from_tuples(tuples, names=['Department', 'Team', 'Resource'])

df.columns=index

My DataFrame structure

My actual DataFrame has .shape (18051, 17).

Resampling:

From this, I am trying to .resample by month with the following code:

dfByMonth = df.resample('M').sum()

The dummy data works as expected:

My DataFrame by month

My actual DataFrame returns.shape (593, 3).

Notes:

  • The three columns that get returned always seem to be the same three (from the same department)
  • The returns columns aren't the first or the last when sorted alphabetically
  • Removing the multi-index (df.columns = ' '.join(col).strip() for col in df.columns.values]) has no effect

Update following JoeCondron's comment:

Running [df.iloc[:,i].apply(type).value_counts() for i in range(df.shape[1])] gives the below - "Department 4" are the three columns that I get returned in the .resample()... I see that they are the only float columns which aren't followed by <class 'decimal.Decimal'> - this looks like the smoking gun but I don't understand the differences between them...I would have thought being both numeric they could both be resampled? (NOTE: this is via a Django response)

<class 'float'> 1571 
<class 'decimal.Decimal'> 30 Name: (Department 1, A, ROLE1), dtype: int64, 
<class 'float'> 1571 <class 'decimal.Decimal'> 30 Name: (Department 1, A ROLE2), dtype: int64, 
<class 'float'> 1571 <class 'decimal.Decimal'> 30 Name: (Department 1, A ROLE3), dtype: int64, 
<class 'float'> 1307 <class 'decimal.Decimal'> 294 Name: (Department 2, A ROLE1), dtype: int64, 
<class 'float'> 1307 <class 'decimal.Decimal'> 294 Name: (Department 2, A ROLE2), dtype: int64, 
<class 'float'> 1307 <class 'decimal.Decimal'> 294 Name: (Department 2, A ROLE3), dtype: int64, 
<class 'decimal.Decimal'> 1281 <class 'float'> 320 Name: (Department 3, A ROLE1), dtype: int64, 
<class 'decimal.Decimal'> 1281 <class 'float'> 320 Name: (Department 3, A ROLE2), dtype: int64, 
<class 'decimal.Decimal'> 1281 <class 'float'> 320 Name: (Department 3, A ROLE3), dtype: int64, 
<class 'float'> 1601 Name: (Department 4, A ROLE1), dtype: int64, 
<class 'float'> 1601 Name: (Department 4, A ROLE2), dtype: int64, 
<class 'float'> 1601 Name: (Department 4, A ROLE3), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 5, A ROLE1), dtype: int64, 
<class 'float'> 1361 <class 'decimal.Decimal'> 240 Name: (Department 5, A ROLE2), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 5, A ROLE3), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 6, A ROLE1), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 6, A ROLE2), dtype: int64]
<class 'decimal.Decimal'> 1601 Name: (Department 6, A ROLE3), dtype: int64, 
<class 'decimal.Decimal'> 1601 Name: (Department 7, A ROLE1), dtype: int64]
Community
  • 1
  • 1
Bendy
  • 3,506
  • 6
  • 40
  • 71
  • What are the `dtypes` of your actual data? – JoeCondron Aug 23 '17 at 08:02
  • Thanks Joe - I've tacked an update to the end of my question – Bendy Aug 23 '17 at 08:44
  • `Decimal` data type is not supported by `numpy` and so `pandas` will just treat it as `object`. Try doing `df = df.astype(float)` before resampling. – JoeCondron Aug 23 '17 at 10:05
  • Thanks very much Joe - that solved it! If you want to stick it in an answer I'll accept it (especially if you could the logic behind Decimal/Float/Int conflicts that I keep getting) – Bendy Aug 23 '17 at 10:43

0 Answers0