Reshape, concatenate and aggregate multiple pandas DataFrames

Question

I have five different pandas data frames showing results of calculations done of the same data with same number of samples , all the arrays are identical in shape. (5x10)

df shape for each data set:



   (recording channels)
   0 1 2 3 4 5 6 7 8 9
t)
0  x x x x x x x x x x
1  x x x x x x x x x x
2  x x x x x x x x x x
3  x x x x x x x x x x
4  x x x x x x x x x x


df 1 : calculation 1
df 2 : calculation 2
.
.
.
df 5 : calculation 5

I want to merge all these data frames into a single data frame which looks something like this:

recording_channel-----time-----cal_1----cal_2----cal_3....cal_5
       0                0        x        x        x        x
       0                1        x        x        x        x
       0                2        x        x        x        x
       0                3        x        x        x        x
       0                4        x        x        x        x
       1                0        x        x        x        x
       1                1        x        x        x        x
       1                2        x        x        x        x
       1                3        x        x        x        x
       1                4        x        x        x        x
       .                .        .        .        .        .
       .                .        .        .        .        .
       9                4        x        x        x        x

code to generate data:

import numpy as np 
import pandas as pd

list_df = []

for i in range(5):
    a = np.array(np.random.randint(0,1000+i, 50))
    a = a.reshape(5,10)
    df = pd.DataFrame(a)
    list_df.append(df)

for i in list_df:
    print(len(i))

df_joined = pd.concat(list_df, axis=1)

print(df_joined)

Better show us your input and expected output with some sample data , rather than using `x` — BENY, Apr 08 '19 at 14:10
I chose to ignore the data values because they can hold any value . `a = np.array(np.random.randint(0,1000, 50)); a = a.reshape(5,10)` — abhishake, Apr 08 '19 at 14:16
I was trying to get my head around in the reshaping and merging — abhishake, Apr 08 '19 at 14:19
I couldn't fully understand what you want. Can you give an example (perhaps with a small dataframe with numbers) just to make it clearer? — edinho, Apr 08 '19 at 14:26

nick · Answer 1 · 2019-04-08T15:57:47.393

0

Using your code to generate the data, we use melt to transform it from wide to long format:

df_all = pd.DataFrame()
for i in range(5):
    a = np.array(np.random.randint(0,1000+i, 50))
    a = a.reshape(5,10)
    df = pd.DataFrame(a)
    list_df.append(df)
    # rather using melt here
    df_long = pd.melt(df.reset_index().rename(columns={'index': 'time'}), 
                                    id_vars='time', value_name='col', 
                                    var_name='recording_channel')
    df_all['col'+str(i+1)] = df_long['col']

# storing the other columns in your result
df_all['recording_channel'] = df_long.recording_channel
df_all['time'] = df_long.time
df_all.head()

edited Apr 08 '19 at 15:57

answered Apr 08 '19 at 14:26

nick

1,310
8
15

the method raises an exception that the data should be one dimensional – abhishake Apr 08 '19 at 14:40
My apologies. Remove the ",1" from the (-1,1) in the reshape term. See updated answer. Each entry in the dictionary should be a 1D array and not a 2D array. – nick Apr 08 '19 at 14:42
I don't know if it's unrelated but when using dask to create array from the dictionary it gives this error `TypeError: __init__() missing 3 required positional arguments: 'name', 'meta', and 'divisions'` – abhishake Apr 08 '19 at 15:39
No it isn't related. As your error suggests, you are missing some initialisation parameters for some constructor. I also don't know what you mean by `create array from the dictionary ` – nick Apr 08 '19 at 15:42
what I meant was make dataframe from dictionary. -Thanks – abhishake Apr 08 '19 at 15:45
I have updated my suggestion. The dictionary solution was just horrible. Anyway. I used your code to generate the date to show how it might be done. The `melt` term does the wide to long transformation then the only thing left to do is put the data where you want it. I am not sure what you are trying to do with dask so maybe that is a different question. – nick Apr 08 '19 at 15:59

Reshape, concatenate and aggregate multiple pandas DataFrames

1 Answers1