0

I think I need some help with the "melt" function.

I have a dataframe which looks like the following:

enter image description here

As you can see the current index is time.

However, if would like to achieve the following:

enter image description here

Rationale behind is that I would like to compare the RZS value of the same quarter for different days.

I know that I can use the melt function but I have no clue how this function works...

from pandas import melt

df = pd.DataFrame(index=['00:00:00', '00:15:00'], columns=["2014-12-01","2014-12-02"])

which creates the dataframe but I have no clue how to fill it. my questions:

  1. What's the easiest way to create an index which contains all 96 quarters of a day?
  2. How can I use the melt function to fill the new df?

Thank you very much in advance.

Johannes
  • 147
  • 3
  • 14

1 Answers1

2

You are probably looking for pivot_table, which is like the inverse of melt. For simplicity, the following code recreates the input DataFrame with the "Uhrzeit" column containing 96 integer values, representing the time quarters:

import pandas as pd
import numpy as np

data = {
    'Datum': ['2014-12-01'] * 96 + ['2014-12-02'] * 96,
    'Uhrzeit': range(96) + range(96),
    'RZS': np.random.rand(96*2),
}

df = pd.DataFrame(data).set_index('Datum')[['Uhrzeit', 'RZS']]
df.reset_index(inplace=True) # Now this df looks like the input you described
df = pd.pivot_table(df, values='RZS', rows='Uhrzeit', cols='Datum')
print df[:10]

Output:

Datum    2014-12-01  2014-12-02
Uhrzeit                        
0          0.864674    0.363400
1          0.736678    0.925202
2          0.807088    0.076891
3          0.007031    0.528020
4          0.047997    0.216422
5          0.625339    0.636028
6          0.115018    0.141142
7          0.424289    0.101075
8          0.544412    0.147669
9          0.151214    0.274959

You can then slice out the DataFrame containing the desired "Uhrzeit"s.


Edit: It seems that the column RZS is represented as strings, which causes some problem to pivot_table as it expects the value column to be numerical. Here is a quick fix to convert that column to numerical, assuming that a str '1.087,29' should be regarded as a float 1087.29:

df = pd.DataFrame({'RZS': ['1.087,29', '1.087.087,28', '1.087.087.087,28']})

def fix(x):
    return x.replace('.', '').replace(',', '.')

df['RZS'] = df['RZS'].apply(fix).astype(float)

# The column RZS now should be of dtype float, and pivot_table should work.
YS-L
  • 14,358
  • 3
  • 47
  • 58
  • Hi, thank you very much for this. However, I get the following error: "No numeric types to aggregate". First I thought tat the reason is that I have comma separated values instead of dotted, i.e. 1,23 instead of 1.23. However, I replaced "," by "." and the error still exsits... – Johannes Jan 06 '15 at 08:36
  • Can you check the dtype of the column `RZS` (``df.dtypes``)? It has to be numerical (see this [question](http://stackoverflow.com/questions/19279229/pandas-pivot-table-with-non-numeric-values-dataerror-no-numeric-types-to-ag)). – YS-L Jan 06 '15 at 08:56
  • Ok I found the problem. Can you show me how to convert this 1.087,29 (on thousand ..) into this 1087.29 ? I tried the replace function in that way: replace "." by "" and "," by "." but than it throws an empty series... – Johannes Jan 06 '15 at 11:03
  • thank you perfect!! So the trick was to put the replace together?! – Johannes Jan 06 '15 at 12:19