0

I have a dataset new_products which describes the number of months its been since a product launched. I aggregated that data together so that I have 'since_debut' and 'count'. Which describes the number of products that debuted 1, 2, 3....60 month ago. I am having trouble creating a histogram with seaborn.

df = since_debut     count
          1           1784
          2           7345
          3           11111
          4           13255

sns.histplot(data=df, x="since_debut", y="count", bins=30, kde=True)

ValueError: Could not interpret value `since_debut` for parameter `x`

Unsure what is throwing this error and why it can't interpret the aggregated data. Any help or advice is appreciated.

Mitchell.Laferla
  • 221
  • 2
  • 12
  • 1
    `since_debut` cannot be an index – Plagon Jan 09 '23 at 20:48
  • follow up on @Plagon's comment you can just do `sns.countplot(data=df.reset_index(), x="since_debut", y="count", bins=30, kde=True)` to make sure `since_debut` is a column and not an index – mitoRibo Jan 09 '23 at 20:58
  • Can you clarify about since_debut being an index and not a real column. Its a calculated field that I used to groupby . Am I missing something?@mitoRibo – Mitchell.Laferla Jan 09 '23 at 21:10
  • 1
    [groupby](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.groupby.html) sets the `by` variables as index by default. You can either use the proposal from @mitoRibo or set `groupby(..., as_index=False)`. – Plagon Jan 09 '23 at 21:14

1 Answers1

0

Since you have already aggregated dataset shouldn't you use something like barplot:

sns.barplot(data=df, x="since_debut", y="count")

enter image description here

countplot should be used on original data and will aggregate data over one of the axis itself.

Guru Stron
  • 102,774
  • 10
  • 95
  • 132