Pandas Dataframe filter and For Loop

Question

I have a dataframe with many columns. I am trying to filter one of those columns ('Region') and create a separate dataframe based on each of those 4 regions in the ''Region' column. And then run a large block of code that contains a bunch of calculations on each of those 4 separate dataframes without having to rewrite the large block of code 4 separate times.

I know i can use an .isin function for the column filtering and do this for my 4 regions (US, EM, Europe, Asia):

US = df[df['Region'].isin('US')]
EM = df[df['Region'].isin('EM')]
Europe = df[df['Region'].isin('Europe')]
Asia = df[df['Region'].isin('Asia')]

And then run my block of code on the 4 new dataframes. But i would be executing my large block of calculation code 4 separate times and it is just too messy. How can i do this in a loop so i only have to write my large block of code one time? If there is another function i can use to do this besides a for loop that would be awesome as well. Appreciate any help- trying to learn.

Dummy Code:

df = pd.DataFrame({'a':[1,2,3,4,5,6], 'b':['cats','dogs','birds','pianos','elephant','dinos'], 'Region' : ['EM', 'US', 'US', 'Europe', 'Asia', 'Asia']})

Why do you need to split the `DataFrame`? If the calculations for each region are all similar, then you can either use `np.select` or a dictionary to map the regions to region-specific values for each calculation. But without knowing *what* you need to do, it's hard to provide more guidance. — ALollz, Dec 21 '18 at 15:33

jpp · Answer 1 · 2018-12-21T16:46:11.167

6

Just iterate a groupby object:

dfs = {}
for region, df_region in df.groupby('Region'):
    # do something to df_region
    # ...
    # then store in dictionary
    dfs[region] = df_region

Then access individual dataframes via dfs['US'], dfs['Asia'], etc.

You can, of course, tailor your operation to be dependent on region, but this is not necessary. Each df_region represents a dataframe filtered by df[df['Region'] == region].

edited Dec 21 '18 at 16:46

answered Dec 21 '18 at 15:34

jpp

159,742
34
281
339

`df.groupby('Region').apply(some_function)` ? – d_kennetz Dec 21 '18 at 15:35
1

@d_kennetz, Depends on the operations. But yes, potentially. – jpp Dec 21 '18 at 15:36
2

@d_kennetz I think just mentioned `groupby` is good enough , since we do not know the function he want to apply – BENY Dec 21 '18 at 15:37
the large block of code i want to run is basically just a few pivot tables that i stack together to create one giant dataframe – spacedinosaur10 Dec 21 '18 at 15:45
3

@spacedinosaur10 then you can run it pivot table with you original df , since pivot is another layout of groupby check https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe – BENY Dec 21 '18 at 15:46
Agree with @W-B, you are asking for a solution to the *wrong* problem. – jpp Dec 21 '18 at 15:57
Yeah i figured i was looking at it wrong. thanks for the help. I I am kind of confused by the output though.. Below is an example of my pivot (imagine Roll, animal and year are all columns in df. I want it to run through my code produce 4 different pivots, one for each region, so that i can .concat them on top of each other. dfs = {} for region, df_region in df.groupby('Region'): pivot = df[df['Roll'].isin(Analyst) & df['Animal'].isin(cat) & df['Year'].isin(2010)] dfs[region] = df_region – spacedinosaur10 Dec 21 '18 at 16:44
@spacedinosaur10, I can't read code in comments. I suggest you write a [new question](https://stackoverflow.com/questions/ask) with a **[mcve]** (including input & desired output). – jpp Dec 21 '18 at 16:45
created a separate question that is a bit easier to understand- thanks- https://stackoverflow.com/questions/53888636/create-separate-dataframes-by-iterate-a-groupby-object – spacedinosaur10 Dec 21 '18 at 17:22

Pandas Dataframe filter and For Loop

1 Answers1