Pivoting dataframes with pd.melt() on time series data

Question

I have some data here:

       Country/Region  1/22/20  1/23/20  1/24/20  1/25/20  1/26/20  1/27/20
0               Afghanistan        0        0        0        0        0   
1                   Albania        0        0        0        0        0   
2                   Algeria        0        0        0        0        0   
3                   Andorra        0        0        0        0        0   
4                    Angola        0        0        0        0        0   
5       Antigua and Barbuda        0        0        0        0        0   
6                 Argentina        0        0        0        0        0   
7                   Armenia        0        0        0        0        0   
8                 Australia        0        0        0        0        0   
9                 Australia        0        0        0        0        3   
10                Australia        0        0        0        0        0   
11                Australia        0        0        0        0        0   
12                Australia        0        0        0        0        0   
13                Australia        0        0        0        0        0   
14                Australia        0        0        0        0        1   
15                Australia        0        0        0        0        0   
16                  Austria        0        0        0        0        0   
17               Azerbaijan        0        0        0        0        0   
18                  Bahamas        0        0        0        0        0   
19                  Bahrain        0        0        0        0        0   
20               Bangladesh        0        0        0        0        0

I'd like to rearrange this so that the dates are rows, while the countries are columns. Like this:

Country/Region   Afghanistan   Albania

1/22/20              0            0
1/23/20              0            0
1/24/20              0            0

and so on. I've tried to use pd.melt, but can't quite nail how to get the desired output. Here's my attempt:

%matplotlib inline
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import math

data = pd.read_csv("covid.csv", sep=",")

data = data.drop(["Province/State","Lat","Long"], axis=1)

data_melted = data.melt(value_vars=data.columns[1:], var_name="Date",value_name="Cases")

          Date  Cases
0      1/22/20      0
1      1/22/20      0
2      1/22/20      0
3      1/22/20      0
4      1/22/20      0
5      1/22/20      0
6      1/22/20      0
7      1/22/20      0
8      1/22/20      0
9      1/22/20      0
10     1/22/20      0
11     1/22/20      0
12     1/22/20      0
13     1/22/20      0
14     1/22/20      0

I also tried:

data_melted = data.melt(value_vars=[data.columns[1:], "Country/Region"])

but this came up with a TypeError: unhashable type: 'Index' even though "Country/Region" wasn't the index.

Would appreciate any help on this.

`df.set_index('Country/Region').T`? – Quang Hoang Apr 11 '20 at 19:57 — Quang Hoang, Apr 11 '20 at 19:57
Thanks, that worked - way more simple than I thought! – Nathan Thomas Apr 11 '20 at 20:06 — Nathan Thomas, Apr 11 '20 at 20:06

score 2 · Accepted Answer · answered Apr 11 '20 at 20:07

2

You are looking to transpose the table:

df.set_index('Country/Region').T

I noticed that Australia was repeated multiple times, if you want to consolidate by adding them up:

df.set_index('Country/Region').T \
    .groupby(level=0, axis=1) \
    .sum()

answered Apr 11 '20 at 20:07

Code Different

90,614
16
144
163

score 0 · Answer 2 · answered Apr 11 '20 at 21:46

Original attempt would work if you specify the id_vars argument of pandas.melt. Then, for country columns, run a pivot_table aggregation which actually renders a time series data frame (i.e., date/time as index) for direct plotting.

data_melted = (data.melt(id_vars = ['Country/Region'],
                         var_name = 'Date', value_name='Cases')
                   .assign(Date = lambda x: pd.to_datetime(x['Date']))

data_pivoted = data_melted.pivot_table(index='Date', columns='Country/Region', 
                                       values='Cases', aggfunc='sum')

Pivoting dataframes with pd.melt() on time series data

2 Answers2