1

I have three different dataframes of economic measures. The columns are years and the rows are countries. I want to take each country's rows and form a dataframe for each country such that the columns are the three economic measures and the rows are years.

For example: Austria

         GDP     |    CPI    |    Interest rate

1998 |xxxxxxxxxxx|xxxxxxxxxxx|xxxxxxxxxxxxxx

1999 |xxxxxxxxxxx|xxxxxxxxxxx|xxxxxxxxxxxxxx

I'm having trouble doing this in python because I am not sure how to manipulate rows.

Follow up question:

I now have a dataframe that looks something like this:

by_country: [

           GDP | CPI    |    Interest rate

Country | Austria | Austria | Austria

1998 |xx xx xx xx|xx xx xx|xxxxxxxx

1998 |xx xx xx xx|xx xx xx|xxxxxxxx ......

           GDP | CPI    |    Interest rate

Country | Belgium | Belgium | Belgium

1998 |xx xx xx xxx|xx xx xxx|xxxxxxxx

]

I want to be able to call stuff like this: Austria.GDP, Belgium.CPI, etc. I think the first step would be to define a function that calls the information for a country within the big dataframe such as by_country(Austria).

Essentially, I would like to be able to call country_df(Austria).GDP

Any thoughts on how to do this?

JSC
  • 181
  • 2
  • 12
  • 1
    Can you post representative raw input data, some code to reproduce your dfs, and desired output and your attempt – EdChum Jul 14 '15 at 20:00

1 Answers1

1

First, you could transpose each data frame so that the rows are the years and the columns are the countries, then take each respective column from the 3 data frames and join them together. Something like this would give you a data frame for each country:

gdp = gdp_df.transpose()
cpi = cpi_df.transpose()
interest = interest_df.transpose()

by_country = {}

# Assumes the same ordering of countries in each data frame
for country in gdp.columns:
    country_df = pandas.concat([gdp[country], cpi[country], interest[country]], axis=1)
    country_df.columns = ['GDP', 'CPI', 'Interest rate']
    by_country[country] = country_df

You can now do something like:

by_country['Austria'].GDP
Brett Patterson
  • 226
  • 1
  • 3
  • Thanks, could you check out my follow up question? – JSC Jul 15 '15 at 16:55
  • by_country['Austria'].GDP doesn't work but by_country[0].GDP does. I think this is because by_country is a list so we can only call indices? Is there any way to name indices, for example index:0, name: Austria ? – JSC Jul 16 '15 at 16:14
  • I changed `by_country` to a dictionary in my edit with the country names as keys...sorry I didn't make it clear that that was changed as well. – Brett Patterson Jul 16 '15 at 19:46
  • It works, but something like by_country['Austria'].GDP returns a series instead of a float. Is there any way to tweak the code to return a float? – JSC Jul 17 '15 at 16:08
  • What would that float represent? You get back a series because there is a GDP entry for the country for each year. You could, for example, get the mean GDP with `by_country['Austria'].GDP.mean()` or get the value for a specific year with `by_country['Austria'].loc['GDP', '1998']` – Brett Patterson Jul 17 '15 at 17:36
  • I am trying to run a regression, so np.log(by_country['Denmark'].CPI) for example returns an error: 'str' object has no attribute 'log' – JSC Jul 17 '15 at 17:53
  • That error is saying that `np` is a string. Are you sure that you've imported numpy correctly as `np` and haven't shadowed it with another variable named `np`? – Brett Patterson Jul 17 '15 at 19:00
  • Yes, i have not shadowed it with anything. – JSC Jul 17 '15 at 21:05