Reshape long format to wide format with two columns for each observation in long format

Question

I am trying to reshape a pandas dataframe with the following long format:

ISO3    Indicator       Year    Value
FRA Pop. density        2003    113,6
FRA Pop. density        2004    114,5
FRA Pop. density        2005    115,4
USA Pop. density        2003    31,7
USA Pop. density        2004    32,0
USA Pop. density        2005    32,3
FRA Pupil-teacher ratio 2003    18,6
FRA Pupil-teacher ratio 2004    18,6
FRA Pupil-teacher ratio 2005    18,6
USA Pupil-teacher ratio 2003    14,8
USA Pupil-teacher ratio 2004    14,2
USA Pupil-teacher ratio 2005    14,1

To this:

    Pop. density            Pupil-teacher ratio 
    2003    2004    2005    2003    2004    2005
FRA 113,6   114,5   115,4   18,6    18,6    18,6
USA 31,7    32,0    32,3    14,8    14,2    14,1

I have tried stack and pivot but with no luck.

Pivot attempt:

smallstack.pivot(index='ISO3', columns=['Indicator', 'Year'], values='Value')

Results in:

KeyError                                  Traceback (most recent call last)
<ipython-input-612-c43d9ec16c54> in <module>
----> 1 smallstack.pivot(index='ISO3', columns=['Indicator', 'Year'], values='Value')

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\frame.py in pivot(self, index, columns, values)
   5192         """
   5193         from pandas.core.reshape.reshape import pivot
-> 5194         return pivot(self, index=index, columns=columns, values=values)
   5195 
   5196     _shared_docs['pivot_table'] = """

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\reshape\reshape.py in pivot(self, index, columns, values)
    404         else:
    405             index = self[index]
--> 406         index = MultiIndex.from_arrays([index, self[columns]])
    407 
    408         if is_list_like(values) and not isinstance(values, tuple):

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2680         if isinstance(key, (Series, np.ndarray, Index, list)):
   2681             # either boolean or fancy integer index
-> 2682             return self._getitem_array(key)
   2683         elif isinstance(key, DataFrame):
   2684             return self._getitem_frame(key)

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\frame.py in _getitem_array(self, key)
   2724             return self._take(indexer, axis=0)
   2725         else:
-> 2726             indexer = self.loc._convert_to_indexer(key, axis=1)
   2727             return self._take(indexer, axis=1)
   2728 

~\Anaconda3\envs\scipy18jlab\lib\site-packages\pandas\core\indexing.py in _convert_to_indexer(self, obj, axis, is_setter)
   1325                 if mask.any():
   1326                     raise KeyError('{mask} not in index'
-> 1327                                    .format(mask=objarr[mask]))
   1328 
   1329                 return com._values_from_object(indexer)

KeyError: "['Year'] not in index"

Any advice will be highly appreciated!

jezrael · Accepted Answer · 2019-03-31T16:30:03.640

0

First check columns names:

print (smallstack.columns.tolist())
['ISO3', 'Indicator', 'Year', 'Value']

And then DataFrame.set_index with Series.unstack:

df = smallstack.set_index(['ISO3', 'Indicator', 'Year'])['Value'].unstack([1,2])
print (df)
Indicator Pop. density               Pupil-teacher ratio            
Year              2003   2004   2005                2003  2004  2005
ISO3                                                                
FRA              113,6  114,5  115,4                18,6  18,6  18,6
USA               31,7   32,0   32,3                14,8  14,2  14,1

If not working, because duplicates use DataFrame.pivot_table, but first convert column Value to numeric:

smallstack['Value'] = smallstack['Value'].str.replace(',','.').astype(float)
smallstack.pivot_table(index='ISO3', columns=['Indicator', 'Year'], values='Value')

edited Mar 31 '19 at 16:30

answered Mar 31 '19 at 16:24

jezrael

822,522
95
1,334
1,252

1

Perfect. There was some white space around Year but the real trick is to set the indices it seems. Thanks. – Laurens Mar 31 '19 at 18:22
When I use my real data I get the following error on one of the dataframes: "ValueError: Index contains duplicate entries, cannot reshape". I understand this to be due to a problem in my data. Is there a way to see on which rows the procedure fails? – Laurens Apr 02 '19 at 07:57

score 0 · Answer 2 · answered Mar 31 '19 at 16:53

I'm not sure, if you can do both sets of columns in one pass of pivot. The error doesn't point out, that he can't find "year" but rather that he can't find "['year']". That means the type is wrong. Try something like this by doing it one at a time and concat it at the end. You'll have to do it dynamic of course for more than the shown categories.


smallstack = pd.DataFrame({'ISO3': ['FRA', 'USA', 'FRA', 'USA'],
                           'Indicator': ['Pop. density', 'Pop. density', 'Pupil-teacher ratio', 'Pupil-teacher ratio'],
                           'Year': [2003, 2004, 2003, 2004],
                           'Value': [113.6, 115.6, 113.6, 115.6, ]})

pivots = [smallstack.loc[smallstack.Indicator == code].
              pivot(index='ISO3', columns='Year', values='Value')
          for code in smallstack.Indicator.unique()]
df = pd.concat(pivots, axis='columns')

Reshape long format to wide format with two columns for each observation in long format

2 Answers2