2

Suppose I have a set of data frames

df1 is

   ID       C1
0  0  0.000000
1  1  0.538516
2  2  0.509902
3  3  0.648074
4  4  0.141421

df2 is

  ID        C1
0  0  0.538516
1  1  0.000000
2  2  0.300000
3  3  0.331662
4  4  0.608276

and df3 is

  ID        C1
0  0  0.509902
1  1  0.300000
2  2  0.000000
3  3  0.244949
4  4  0.509902

I then go ahead and transpose these three data frames.

df1 = df1.T
df2 = df2.T
df3 = df3.T

Now the data frames are :

df1 is

          0         1         2         3         4
ID        0         1         2         3         4
C1        0  0.538516  0.509902  0.648074  0.141421

df2 is :

                 0         1    2         3         4
ID               0         1    2         3         4
C1               0.538516  0  0.3  0.331662  0.608276

and df3 is :

                 0    1    2         3         4
ID               0    1    2         3         4
C1          0.509902  0.3  0  0.244949  0.509902

Can I somehow combine all data frames to have

0            1         2         3         4
0          0.538516  0.509902  0.648074  0.141421
0.538516     0        0.3       0.331662  0.608276
0.509902     0.3      0         0.244949  0.509902

And then sort rows individually , so that each row in the resulting data frame is sorted ?

For instance the data frame with sorted rows would be

0  0.141421  0.509902  0.538516  0.648074
0  0.3       0.331662  0.538516  0.608276
0  0.244949  0.3       0.509902  0.509902

I'm having problems with concat since I've transposed the data frames.

All help is appreciated

piRSquared
  • 285,575
  • 57
  • 475
  • 624
Hormigas
  • 1,429
  • 5
  • 24
  • 45

2 Answers2

2

UPDATE: sorted data set and without transposing:

In [64]: pd.DataFrame([np.sort(x['C1'].values) for x in dfs], columns=d1.index)
Out[64]:
     0         1         2         3         4
0  0.0  0.141421  0.509902  0.538516  0.648074
1  0.0  0.300000  0.331662  0.538516  0.608276
2  0.0  0.244949  0.300000  0.509902  0.509902

Explanation:

In [67]: d1
Out[67]:
   ID        C1
0   0  0.000000
1   1  0.538516
2   2  0.509902
3   3  0.648074
4   4  0.141421

In [68]: d1['C1'].values
Out[68]: array([ 0.      ,  0.538516,  0.509902,  0.648074,  0.141421])

In [69]: np.sort(d1['C1'].values)
Out[69]: array([ 0.      ,  0.141421,  0.509902,  0.538516,  0.648074])

In [70]: [np.sort(x['C1'].values) for x in dfs]
Out[70]:
[array([ 0.      ,  0.141421,  0.509902,  0.538516,  0.648074]),
 array([ 0.      ,  0.3     ,  0.331662,  0.538516,  0.608276]),
 array([ 0.      ,  0.244949,  0.3     ,  0.509902,  0.509902])]

now we can create a DataFrame from the list of sorted NumPy arrays, specifying d1.index as columns:

In [71]: pd.DataFrame([np.sort(x['C1'].values) for x in dfs], columns=d1.index)
Out[71]:
     0         1         2         3         4
0  0.0  0.141421  0.509902  0.538516  0.648074
1  0.0  0.300000  0.331662  0.538516  0.608276
2  0.0  0.244949  0.300000  0.509902  0.509902
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
2

Setup

df1 = pd.DataFrame({
        'C1': [0.0, 0.538516, 0.509902, 0.648074, 0.141421],
        'ID': [0, 1, 2, 3, 4]})
df2 = pd.DataFrame({
        'C1': [0.538516, 0.0, 0.3, 0.331662, 0.608276],
        'ID': [0, 1, 2, 3, 4]})
df3 = pd.DataFrame({
        'C1': [0.509902, 0.3, 0.0, 0.244949, 0.509902],
        'ID': [0, 1, 2, 3, 4]})

Solution(s)

1

It looks as though your ID column is important an maybe that's what you want to join on. I'd set that as your index and then combine.

To sort each row, I'd leave as columns and apply a sort_values

df = pd.concat(
    [d.set_index('ID') for d in [df1, df2, df3]], ignore_index=True, axis=1)
df.apply(lambda x: x.sort_values().values).T

ID    0         1         2         3         4
0   0.0  0.141421  0.509902  0.538516  0.648074
1   0.0  0.300000  0.331662  0.538516  0.608276
2   0.0  0.244949  0.300000  0.509902  0.509902

2

Get your dataframe and sort in numpy

df = pd.concat([d.set_index('ID').T for d in [df1, df2, df3]], ignore_index=True)

v = df.values
a = v.argsort(1)
i = np.arange(a.shape[0])[:, None]
df.loc[:] = v[i, a]

df

ID    0         1         2         3         4
0   0.0  0.141421  0.509902  0.538516  0.648074
1   0.0  0.300000  0.331662  0.538516  0.608276
2   0.0  0.244949  0.300000  0.509902  0.509902
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks for your answer! how would I do about sorting each row ? I ask this since the C1 has been lost. Also , If I sort each row on transposing , the sorted order is lost after I concatenate the dataframes – Hormigas Jan 29 '17 at 21:12
  • 1
    @Anant this was not specified in your question. I do not understand exactly what you need from your comment alone. It would be better to follow up with another posted question. That way you can include more information in regards to specifically what you want to see. – piRSquared Jan 29 '17 at 21:14
  • I've added details to the question. Thanks! – Hormigas Jan 29 '17 at 21:17
  • Thanks for the sorted version too! Really appreciate it – Hormigas Jan 29 '17 at 21:28