2

I want to plot a bar graph for sales over period of year. x-axis as 'year' and y-axis as sum of weekly sales per year. While plotting I am getting 'KeyError: 'year'. I guess it's because 'year' became index during group by.

Below is the sample content from csv file:

Store   year    Weekly_Sales
1   2014    24924.5
1   2010    46039.49
1   2015    41595.55
1   2010    19403.54
1   2015    21827.9
1   2010    21043.39
1   2014    22136.64
1   2010    26229.21
1   2014    57258.43
1   2010    42960.91

Below is the code I used to group by

storeDetail_df = pd.read_csv('Details.csv')
result_group_year= storeDetail_df.groupby(['year'])
total_by_year = result_group_year['Weekly_Sales'].agg([np.sum])

total_by_year.plot(kind='bar' ,x='year',y='sum',rot=0)

Updated the Code and below is the output: DataFrame output:

   year          sum
0  2010  42843534.38
1  2011  45349314.40
2  2012  35445927.76
3  2013         0.00

below is the Graph i am getting: enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
user1882624
  • 303
  • 2
  • 4
  • 16
  • Can you add `storeDetail_df = storeDetail_df.reset_index()` after `storeDetail_df = pd.read_csv('Details.csv')` and then try plotting? I will have to put your data manually as a code in DatFrame so I can't try it myself – Sheldore Sep 16 '18 at 15:58
  • When i Print and checked 'storeDetail_df '. I am seeing seperate Index column apart from the above mentioned column – user1882624 Sep 16 '18 at 16:03
  • Ok, check my answer below. `delim_whitespace=True` solves the problem – Sheldore Sep 16 '18 at 16:08

3 Answers3

4

While reading your csv file, you needed to use white space as the delimiter as delim_whitespace=True and then reset the index after summing up the Weekly_Sales. Below is the working code:

storeDetail_df = pd.read_csv('Details.csv', delim_whitespace=True)
result_group_year= storeDetail_df.groupby(['year'])
total_by_year = result_group_year['Weekly_Sales'].agg([np.sum]).reset_index()
total_by_year.plot(kind='bar' ,x='year',y='sum',rot=0,  legend=False)

Output

enter image description here

Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • my sum value is actually (42843534.38,45349314.40,35445927.76) but in graph y axis value is shown as (1,2,3,4). I checked in eclipse and in jupyter notebook.Something i need to do in the code to show the exact or approximate values? – user1882624 Sep 16 '18 at 17:38
  • That’s strange I just used the same values which you provided. Can you print the data frame before plotting it to see if the sum values are correct – Sheldore Sep 16 '18 at 21:17
  • I have added the data Frame output and the Output graph in original Post.Please Check – user1882624 Sep 17 '18 at 03:23
  • Can you also tell me what version of pandas are you using? It’s strange how you are getting different values – Sheldore Sep 17 '18 at 08:17
  • You posted a new question. I just saw it. Hope you got your answer – Sheldore Sep 17 '18 at 09:37
0

In case it is making year your index due to group by command. you need to remove it as a index before plotting. Try

total_by_year = total_by_year.reset_index(drop=False, inplace=True)
SaKu.
  • 389
  • 4
  • 8
0

You might want to try this

storeDetail_df = pd.read_csv('Details.csv')
result_group_year= storeDetail_df.groupby(['year'])['Weekly_Sales'].sum()

result_group_year = result_group_year.reset_index(drop=False)
result_group_year.plot.bar(x='year', y='Weekly_Sales')
Will_E
  • 68
  • 9
  • Naah, doesn't work. `read_csv` requires a delimiter as white spaces. Yo get a `KeyError: 'year'` in the second line – Sheldore Sep 16 '18 at 16:09