Taking a list of data frames and grouping by a variable and using that variable as the key to a dictionary

Question

I am relatively new to python programming. I have a list of pandas dataframes that all have the column 'Year'. I am trying to group by that column and convert to a dictionary where the dictionary key is the variable 'Year' and values is a list of dataframes of that year. Is this possible in python?

I tried this:

grouped_dict = list_of_csv_files.groupby(by = 'Year').to_dict()

I believe I will have to loop through each dataframe? I did not provide any data because I am hoping it is a somewhat simple solution.

I also tried this:

grouped_dict = list_of_csv_files.groupby(by = 'Year').apply(lambda dfg: dfg.to_dict(orient='list')).to_dict()

Any guidance would be greatly appreciated!

Have you tried: `grouped_dict = {k: v for k, v in list_of_csv_files.groupby('Year')}` ? — Jon Clements, Apr 15 '19 at 15:50
I get this warning: AttributeError: 'list' object has no attribute 'groupby' — Jake, Apr 15 '19 at 15:57
you probably want use `pd.concat` on that list then to build a single dataframe before applying groupby on it... Possibly something like: `pd.concat(list_of_csv_files).groupby('Year')` .... — Jon Clements, Apr 15 '19 at 16:04

David · Answer 1 · 2019-04-15T16:39:53.520

1

Firstly you should read the files into a single dataframe: list_of_dfs = [pd.read_csv(filename, index_col=False) for filename in list_of_csv_files] df = pd.concat(list_of_dfs, sort=True)

Then apply the groupby transformation on the dataframe and convert it into a dictionary: grouped_dict = df.groupby('Year').apply(list).to_dict()

This question is a duplicate of GroupBy results to dictionary of lists

edited Apr 15 '19 at 16:39

answered Apr 15 '19 at 15:51

David

256
2
13

I get this warning: AttributeError: 'list' object has no attribute 'groupby' – Jake Apr 15 '19 at 15:58

score 1 · Accepted Answer · answered Apr 15 '19 at 16:02

Other answers have missed the mark so far, so I'll give you an alternative. Assuming you have CSV files (since your variable is named that way):

from collections import defaultdict

yearly_dfs = defaultdict(list)
for csv in list_of_csv_files:
    df = pd.read_csv(csv)
    for yr, yr_df in df.groupby("Year"):
        yearly_dfs[yr].append(yr_df)

Assuming you have DataFrames already:

from collections import defaultdict

yearly_dfs = defaultdict(list)
for df in list_of_csv_files:
    for yr, yr_df in df.groupby("Year"):
        yearly_dfs[yr].append(yr_df)

Taking a list of data frames and grouping by a variable and using that variable as the key to a dictionary

2 Answers2

Linked

Related