1

I have three lists, [1,4,3] , [2,5,6] , [9,8,7] which refer to a dataframe's series indices. I'm using each list to slice the dataframe into a smaller dataframe for batch data processing. After the processing, I want to recombine the dataframes into the original dataframe, preserving the order of the columns.

df_1 = df.iloc[:,list1]
#carry out preprocessing
df_2 = df.iloc[:,list2]
#carry out preprocessing
df_3 = df.iloc[:,list3]
#carry out preprocessing

#join the frames back together
frames = [df_1,df_2,df_3]
df = pd.concat(frames, axis = 1)

Is there a simple way to concat and preserve the original order of the series? i.e. [1,2,3,4,5,6,7,8,9]

joshi123
  • 835
  • 2
  • 13
  • 33

1 Answers1

1

I think not, need sort_index for sorting columns names:

df = pd.concat(frames, axis = 1).sort_index(axis=1)

If want sorted by indices positions:

L = list1 + list2 + list3
df1 = pd.concat(frames, axis = 1).reindex(columns=df.columns[sorted(L)])

Or sorting in iloc:

df_1 = df.iloc[:,sorted(list1)]
#carry out preprocessing
df_2 = df.iloc[:,sorted(list2)]
#carry out preprocessing
df_3 = df.iloc[:,sorted(list3)]
#carry out preprocessing

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,10)), columns=list('EFGHIJABCD'))
print (df)
   E  F  G  H  I  J  A  B  C  D
0  8  8  3  7  7  0  4  2  5  2
1  2  2  1  0  8  4  0  9  6  2
2  4  1  5  3  4  4  3  7  1  1
3  7  7  0  2  9  9  3  2  5  8
4  1  0  7  6  2  0  8  2  5  1

list1 = [1,4,3]
list2 = [2,5,6]
list3 = [9,8,7]

df_1 = df.iloc[:,list1]
#carry out preprocessing
df_2 = df.iloc[:,list2]
#carry out preprocessing
df_3 = df.iloc[:,list3]
#carry out preprocessing

#join the frames back together
frames = [df_1,df_2,df_3]
L = list1 + list2 + list3

df1 = pd.concat(frames, axis = 1).reindex(columns=df.columns[sorted(L)])
print (df1)
   F  G  H  I  J  A  B  C  D
0  8  3  7  7  0  4  2  5  2
1  2  1  0  8  4  0  9  6  2
2  1  5  3  4  4  3  7  1  1
3  7  0  2  9  9  3  2  5  8
4  0  7  6  2  0  8  2  5  1

df2 = pd.concat(frames, axis = 1).sort_index(axis=1)
print (df2)
   A  B  C  D  F  G  H  I  J
0  4  2  5  2  8  3  7  7  0
1  0  9  6  2  2  1  0  8  4
2  3  7  1  1  1  5  3  4  4
3  3  2  5  8  7  0  2  9  9
4  8  2  5  1  0  7  6  2  0

EDIT:

If same columns names as values in list L:

L.sort()
df = df[L]

Or:

df = df[sorted(L)]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252