How to take rows from python pandas dataframe and make into columns of one new dataframe

Question

I have three different dataframes of economic measures. The columns are years and the rows are countries. I want to take each country's rows and form a dataframe for each country such that the columns are the three economic measures and the rows are years.

For example: Austria

         GDP     |    CPI    |    Interest rate

1998 |xxxxxxxxxxx|xxxxxxxxxxx|xxxxxxxxxxxxxx

1999 |xxxxxxxxxxx|xxxxxxxxxxx|xxxxxxxxxxxxxx

I'm having trouble doing this in python because I am not sure how to manipulate rows.

Follow up question:

I now have a dataframe that looks something like this:

by_country: [

           GDP | CPI    |    Interest rate

Country | Austria | Austria | Austria

1998 |xx xx xx xx|xx xx xx|xxxxxxxx

1998 |xx xx xx xx|xx xx xx|xxxxxxxx ......

           GDP | CPI    |    Interest rate

Country | Belgium | Belgium | Belgium

1998 |xx xx xx xxx|xx xx xxx|xxxxxxxx

]

I want to be able to call stuff like this: Austria.GDP, Belgium.CPI, etc. I think the first step would be to define a function that calls the information for a country within the big dataframe such as by_country(Austria).

Essentially, I would like to be able to call country_df(Austria).GDP

Any thoughts on how to do this?

Can you post representative raw input data, some code to reproduce your dfs, and desired output and your attempt — EdChum, Jul 14 '15 at 20:00

Brett Patterson · Answer 1 · 2015-07-16T00:22:51.020

1

First, you could transpose each data frame so that the rows are the years and the columns are the countries, then take each respective column from the 3 data frames and join them together. Something like this would give you a data frame for each country:

gdp = gdp_df.transpose()
cpi = cpi_df.transpose()
interest = interest_df.transpose()

by_country = {}

# Assumes the same ordering of countries in each data frame
for country in gdp.columns:
    country_df = pandas.concat([gdp[country], cpi[country], interest[country]], axis=1)
    country_df.columns = ['GDP', 'CPI', 'Interest rate']
    by_country[country] = country_df

You can now do something like:

by_country['Austria'].GDP

edited Jul 16 '15 at 00:22

answered Jul 14 '15 at 20:16

Brett Patterson

226
1
3

Thanks, could you check out my follow up question? – JSC Jul 15 '15 at 16:55
by_country['Austria'].GDP doesn't work but by_country[0].GDP does. I think this is because by_country is a list so we can only call indices? Is there any way to name indices, for example index:0, name: Austria ? – JSC Jul 16 '15 at 16:14
I changed `by_country` to a dictionary in my edit with the country names as keys...sorry I didn't make it clear that that was changed as well. – Brett Patterson Jul 16 '15 at 19:46
It works, but something like by_country['Austria'].GDP returns a series instead of a float. Is there any way to tweak the code to return a float? – JSC Jul 17 '15 at 16:08
What would that float represent? You get back a series because there is a GDP entry for the country for each year. You could, for example, get the mean GDP with `by_country['Austria'].GDP.mean()` or get the value for a specific year with `by_country['Austria'].loc['GDP', '1998']` – Brett Patterson Jul 17 '15 at 17:36
I am trying to run a regression, so np.log(by_country['Denmark'].CPI) for example returns an error: 'str' object has no attribute 'log' – JSC Jul 17 '15 at 17:53
That error is saying that `np` is a string. Are you sure that you've imported numpy correctly as `np` and haven't shadowed it with another variable named `np`? – Brett Patterson Jul 17 '15 at 19:00
Yes, i have not shadowed it with anything. – JSC Jul 17 '15 at 21:05

How to take rows from python pandas dataframe and make into columns of one new dataframe

1 Answers1