1

I have a dataframe like: enter image description here

How can it be reshaped that the columns (0 => 1, 2=> 3) are stored as records? I.e. columns metric_name and metric_value and multiple rows (observations)?

pandas_dict = {0: {0: 'Model:',
  1: 'Dependent Variable:',
  2: 'Date:',
  3: 'No. Observations:',
  4: 'Df Model:',
  5: 'Df Residuals:',
  6: 'Converged:',
  7: 'No. Iterations:'},
 1: {0: 'Logit',
  1: 'sick_percentage',
  2: '2019-08-14 13:32',
  3: '28',
  4: '2',
  5: '25',
  6: '0.0000',
  7: '35.0000'},
 2: {0: 'Pseudo R-squared:',
  1: 'AIC:',
  2: 'BIC:',
  3: 'Log-Likelihood:',
  4: 'LL-Null:',
  5: 'LLR p-value:',
  6: 'Scale:',
  7: ''},
 3: {0: 'inf',
  1: '6.0798',
  2: '10.0764',
  3: '-0.039902',
  4: '0.0000',
  5: '1.0000',
  6: '1.0000',
  7: ''}}
df = pd.DataFrame(pandas_dict)
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292

2 Answers2

1

If only 4 columns you can flatten values and create DataFrame by constructor:

a = df[[0, 2]].values.ravel()
b = df[[1, 3]].values.ravel()

df = pd.DataFrame({'A':a, 'B':b})
print (df)
                      A                 B
0                Model:             Logit
1     Pseudo R-squared:               inf
2   Dependent Variable:   sick_percentage
3                  AIC:            6.0798
4                 Date:  2019-08-14 13:32
5                  BIC:           10.0764
6     No. Observations:                28
7       Log-Likelihood:         -0.039902
8             Df Model:                 2
9              LL-Null:            0.0000
10        Df Residuals:                25
11         LLR p-value:            1.0000
12           Converged:            0.0000
13               Scale:            1.0000
14      No. Iterations:           35.0000
15                                       

Or general solution - create MultiIndex in columns with modulo and integer division and reshape by DataFrame.stack:

df.columns = [df.columns % 2, df.columns // 2]
df = df.stack().reset_index(drop=True)
print (df)
                      0                 1
0                Model:             Logit
1     Pseudo R-squared:               inf
2   Dependent Variable:   sick_percentage
3                  AIC:            6.0798
4                 Date:  2019-08-14 13:32
5                  BIC:           10.0764
6     No. Observations:                28
7       Log-Likelihood:         -0.039902
8             Df Model:                 2
9              LL-Null:            0.0000
10        Df Residuals:                25
11         LLR p-value:            1.0000
12           Converged:            0.0000
13               Scale:            1.0000
14      No. Iterations:           35.0000
15                                       
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

You can also use np.concatenate here if you are looking for a bit faster solution.

df=pd.DataFrame(np.concatenate( (df.iloc[:,[0,1]].values, df.iloc[:,[2,3]].values), axis=0 ),columns=['Metric Name','Metric Value'])

If you want to use pandas function, you can use

1) pandas.DataFrame.merge

df= df.iloc[:,[0,1]].rename(columns={0:'Metric Name',1:'Metric Value'}).merge(df.iloc[:,[2,3]].rename(columns={2:'Metric Name',3:'Metric Value'}),how='outer')

2) pandas.concat

df=pd.concat((df.iloc[:,[0,1]].rename(columns={0:'Metric Name',1:'Metric Value'}),df.iloc[:,[2,3]].rename(columns={2:'Metric Name',3:'Metric Value'})), ignore_index=True)
Kartikeya Sharma
  • 1,335
  • 1
  • 10
  • 22