2

I have a dataframe with many columns. I am trying to filter one of those columns ('Region') and create a separate dataframe based on each of those 4 regions in the ''Region' column. And then run a large block of code that contains a bunch of calculations on each of those 4 separate dataframes without having to rewrite the large block of code 4 separate times.

I know i can use an .isin function for the column filtering and do this for my 4 regions (US, EM, Europe, Asia):

US = df[df['Region'].isin('US')]
EM = df[df['Region'].isin('EM')]
Europe = df[df['Region'].isin('Europe')]
Asia = df[df['Region'].isin('Asia')]

And then run my block of code on the 4 new dataframes. But i would be executing my large block of calculation code 4 separate times and it is just too messy. How can i do this in a loop so i only have to write my large block of code one time? If there is another function i can use to do this besides a for loop that would be awesome as well. Appreciate any help- trying to learn.

Dummy Code:

df = pd.DataFrame({'a':[1,2,3,4,5,6], 'b':['cats','dogs','birds','pianos','elephant','dinos'], 'Region' : ['EM', 'US', 'US', 'Europe', 'Asia', 'Asia']})
ALollz
  • 57,915
  • 7
  • 66
  • 89
spacedinosaur10
  • 695
  • 3
  • 10
  • 24
  • 2
    Can you provide some clue on this "large block of code"? – LeandroHumb Dec 21 '18 at 15:33
  • Why do you need to split the `DataFrame`? If the calculations for each region are all similar, then you can either use `np.select` or a dictionary to map the regions to region-specific values for each calculation. But without knowing *what* you need to do, it's hard to provide more guidance. – ALollz Dec 21 '18 at 15:33

1 Answers1

6

Just iterate a groupby object:

dfs = {}
for region, df_region in df.groupby('Region'):
    # do something to df_region
    # ...
    # then store in dictionary
    dfs[region] = df_region

Then access individual dataframes via dfs['US'], dfs['Asia'], etc.

You can, of course, tailor your operation to be dependent on region, but this is not necessary. Each df_region represents a dataframe filtered by df[df['Region'] == region].

jpp
  • 159,742
  • 34
  • 281
  • 339
  • `df.groupby('Region').apply(some_function)` ? – d_kennetz Dec 21 '18 at 15:35
  • 1
    @d_kennetz, Depends on the operations. But yes, potentially. – jpp Dec 21 '18 at 15:36
  • 2
    @d_kennetz I think just mentioned `groupby` is good enough , since we do not know the function he want to apply – BENY Dec 21 '18 at 15:37
  • the large block of code i want to run is basically just a few pivot tables that i stack together to create one giant dataframe – spacedinosaur10 Dec 21 '18 at 15:45
  • 3
    @spacedinosaur10 then you can run it pivot table with you original df , since pivot is another layout of groupby check https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe – BENY Dec 21 '18 at 15:46
  • Agree with @W-B, you are asking for a solution to the *wrong* problem. – jpp Dec 21 '18 at 15:57
  • Yeah i figured i was looking at it wrong. thanks for the help. I I am kind of confused by the output though.. Below is an example of my pivot (imagine Roll, animal and year are all columns in df. I want it to run through my code produce 4 different pivots, one for each region, so that i can .concat them on top of each other. dfs = {} for region, df_region in df.groupby('Region'): pivot = df[df['Roll'].isin(Analyst) & df['Animal'].isin(cat) & df['Year'].isin(2010)] dfs[region] = df_region – spacedinosaur10 Dec 21 '18 at 16:44
  • @spacedinosaur10, I can't read code in comments. I suggest you write a [new question](https://stackoverflow.com/questions/ask) with a **[mcve]** (including input & desired output). – jpp Dec 21 '18 at 16:45
  • created a separate question that is a bit easier to understand- thanks- https://stackoverflow.com/questions/53888636/create-separate-dataframes-by-iterate-a-groupby-object – spacedinosaur10 Dec 21 '18 at 17:22