0

I'm new to python, so please forgive me if this is a stupid question.

I'm trying to separate a bigger dataset into smaller data frames based on a unique row value (station ID). I've done the following, which made a dict and did separate them into smaller data frames, but within this dict?

dfs = dict(list(df.groupby('Station')))

when I open it in Jupyter it only shows the station ID next to a number series (0-20).

is there a way to name these smaller data frames to the station ID? I'm used to R/tidyverse so there has to be a way to do this easily?

Thank you! S

tried the following too:

dct = {}
for idx, v in enumerate(df['Station'].unique()):
    dct[f'df{idx}'] = df.loc[df['Station'] == v]

print(dct)

but just names them df1, df2, df3, etc.

  • Can you provide a small reproducible example? – mozway Nov 16 '22 at 15:25
  • What do you need the `dict` for? You can iterate over the `groupby` directly with `for name, group in df.groupby('Station'): # logic` -- see e.g. [this answer](https://stackoverflow.com/questions/28844535/python-pandas-groupby-get-list-of-groups) if you just want the names. – Joshua Voskamp Nov 16 '22 at 15:25

1 Answers1

0

If you need a dict specifically, you can use

dfs = {name: group for name, group in df.groupby('Station')}

but that creates copies of data; try iterating over the groups (and names) directly with

for name, group in df.groupby('Station'):
    # logic
Joshua Voskamp
  • 1,855
  • 1
  • 10
  • 13
  • I dont need a dict! what would the "name" be? when I try to use it, it says "IndentationError: expected an indented block" – skinkleton Nov 16 '22 at 16:03
  • The `#logic` is a comment; that's where you'd write what you want your code to do. What are you trying to do with the different `'Station'` groups? `'name'` and '`group`' are variables that help you access the "current iteration" while iterating over the groups; see [the documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#iterating-through-groups) – Joshua Voskamp Nov 16 '22 at 16:09
  • I'm trying to separate them so I can plot their unique nutrient data. Sadly, it's formatted in rows. I was able to do: station2 = df.loc[(df"Station"] = "STN002")] which is what I want, but then I have to do it by hand for each station (STN004, STN006, ... STN184, etc) would there be a way to make a for loop that goes through the df and creates variables based on the number after STN? i.e. stationX = df[("Station"] = "STN00X")] ?? – skinkleton Nov 16 '22 at 16:14
  • also, thank you for the documentation, I will play around with it a bit more and get back to you! grouped = df.groupby('Station') for name, group in grouped: print(name) print(group) – skinkleton Nov 16 '22 at 16:19
  • @skinkleton Can you provide an example dataframe to work with, and some example output that you're looking for? – Joshua Voskamp Nov 16 '22 at 16:38
  • https://ibb.co/tDMCzZK -- what I want it to do (create data frame based on unique value in 'Station' column https://ibb.co/R7TgZwq -- example of the base data frame , station column out of frame, but you can see the different stations (STN004, STN006, STN012, etc). does this convey it okay? @Joshua Voskamp – skinkleton Nov 16 '22 at 16:49
  • the first one is supposed to be "Station2 = df.loc[(df["Station"] == "STN002")]" not the group.by code. sorry! but the output looks the same. – skinkleton Nov 16 '22 at 16:58