1

I hope somebody can tell me how to conclude the following step :

I have the following dataframe:

df = [{'Failed by Fraud' : 3, 'Failed by Bot': 7, 'Failed by Geography':4, 'Failed by Suspicious Activity' : 2, 'Impressions': 22, 'Clicks':12, 'Date': '24/07/2016', "ID": 34},]

I unpivot it in Pandas with the following code:

pd.melt(df, id_vars=["Date", "Impressions", "Clicks", "ID"], var_name='Reason of Faiure', value_name='Failures')

The result is the following :

    Date       Impressions  Clicks  ID  Reason of Faiure    Failures
0   24/07/2016  22          12      34  Failed by Bot       7
1   24/07/2016  22          12      34  Failed by Fraud     3
2   24/07/2016  22          12      34  Failed by Geography 4
3   24/07/2016  22          12      34  Failed by Suspicious Activity   2

However, what I would like to have is the following result :

    Date       Impressions  Clicks  ID  Reason of Faiure    Failures
0   24/07/2016  22          12      34  NaN                 NaN
1   24/07/2016  NaN         NaN     34  Fraud               3
2   24/07/2016  NaN         NaN     34  Geography           4
3   24/07/2016  NaN         NaN     34  Suspicious Activity 2
4   24/07/2016  NaN         NaN     34  Bot                 3

This means, in other words, adding 1 row which shows the impressions and clicks received for that day by ID, with empty values for Reasons of Failure and Failures. And on the other hand, empty values for Impressions and Clicks but with values for IDs, Reasons of Failures, and Failures

Matteo M
  • 137
  • 2
  • 8
  • 1
    Do you need general solution or only for your sample? – jezrael Nov 18 '16 at 14:42
  • Hi Jezrael! General solution would be great. I saw that here : https://www.ibm.com/developerworks/community/blogs/jfp/entry/Tidy_Data_In_Python?lang=en they use the following code : tidy = df.groupby('id').apply(pd.DataFrame.pivot, index='date', columns='element', values='value') However, it does not do exactly what I would like – Matteo M Nov 18 '16 at 14:55
  • hi @jezrael I was thinking about the problem over the weekend. I think I can create two dataframes, one with just the columns regarding "Failed by", and one with the columns for Impressions and clicks, and then merge them on id and date. But that is an extra three steps. Do you have any idea how to work it faster? – Matteo M Nov 21 '16 at 09:27
  • 1
    I have same idea with `concat`, unfortunately I dont know other solution. – jezrael Nov 21 '16 at 09:28
  • Would you know how to make sure that on concat the impressions and clicks dataframe will be sorted by date and id? – Matteo M Nov 21 '16 at 09:58
  • 1
    I think you have to use `df.sort_values(['dat','id']`, check [`sort_values`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html) – jezrael Nov 21 '16 at 09:59

0 Answers0