0

I'm new to Python, and working on this project looking at deals won by an IT company. For the data below I'd like to create a stacked bar chart that shows only the top 5 'VP-Manager' values per 'Client Category' i.e. the top 5 'VP-Manager' who have won the most deals per 'Client Category'.

[Data Sample][1]: https://i.stack.imgur.com/f9sdG.png

I took guidance from @Jon Clements response on the below question: Create a stacked bar chart of the N largest columns per row in a dataframe

This was my code:

top5 = (
    # bring the client category back as a column to use as a grouping var
    dataw.reset_index()
    # make a long DF of client category/column/name value
    .melt(id_vars='Client Category')
    # order DF by highest values first
    .sort_values('VP-Manager', ascending=False)
    # group by the index and take the first 5 rows of each
    .groupby('Client Category')
    .head(5)
    # pivot back so we've got an X & Y to chart...
    .pivot('Client Category', 'VP-Manager')
    # drop the value level as we don't need that
    .droplevel(level=0, axis=1)
)

top5.plot.bar(stacked=True)

But I keep getting an error KeyError: 'VP-Manager'

Tried various things (and a couple other methods) but can't seem to fix this.

Please can someone help? Thank you!

Update: I'm looking for a chart similar to this - [Desired Output][2]: https://i.stack.imgur.com/yb6Xl.jpg

Tanvikc
  • 1
  • 1
  • The 'VP-Manager' column doesn't exist in the sort, which needs to be fixed at the time of melt. – r-beginners Jun 25 '21 at 08:21
  • @r-beginners Thanks, please can you tell me how to do that? – Tanvikc Jun 25 '21 at 09:11
  • try this:`melt(id_vars='Client Category',var_vars='VP-Manager',...)` – r-beginners Jun 25 '21 at 09:47
  • Alternatively, stop sorting and the error will disappear. – r-beginners Jun 25 '21 at 09:50
  • @r-beginners Thank you, but the above didn't work. Adding 'VP-Manager' to the 'value' argument like below throws the same error. Removing the 'sort' gives a new error: "KeyError: "None of ['VP-Manager'] are in the columns"" `.melt(id_vars='Client Category', value_vars='VP-Manager')` – Tanvikc Jun 26 '21 at 09:54
  • If you assume that the client category is not unique, the following code will work. `(pd.melt(dataw, id_vars=['Client Category', 'VP-Manager'], value_vars=['Deal Cost']).sort_values('VP-Manager', ascending=False) .groupby('Client Category') .head(5).pivot(index='Client Category', columns='VP-Manager').droplevel(level=0, axis=1))` – r-beginners Jun 26 '21 at 13:50
  • @r-beginners Thank you so much! It's been almost a week trying to find a solution to this. While I'm still not getting the result/chart I wanted (need top 5 VP-Manager based on number of records/wins, but getting something else), I'm just glad that the problem on the melt is now sorted - thanks again! Note - I had to change `pivot` to `pivot_table` as the former was throwing an error: "Value error: index contains duplicate entries cannot reshape" – Tanvikc Jun 28 '21 at 18:01

0 Answers0