3

I am new to Pandas and Seaborn and trying to learn. I am trying to add a trend line and a bar plot on the same graph. I have some data that looks like

Year     Sample Size
 2000      500
 2001      3000
 2003      10000
 2004      20000
 2004      23000

I am new to pandas and seaborn and I am attempting to draw a line through the bar plot showing a decreasing or an increasing trend but struggling to do it on the same graph. Till now, I have a bar plot. Below you can find the code.

sampleSizes['Sample Size'] -> is the column I am plotting. It has about 12 values for 12 years.

plt.figure()
ax = sampleSizes['Sample Size'].plot(kind='bar', title="Trend of Sample Sizes", figsize=(15, 10), legend=True, color = 'grey', fontsize=8)
plt.show()

I am struggling to add a trend line to this. I would be grateful if someone could point me in the right direction.

UPDATE

FinancialYear  Sample Size
   2001         2338
   2002         3171
   2003         2597
   2004         2740
   2005         3447
   2006         3098
   2007         2610
   2008         2819
   2009         2057
   2010         2174
   2011         2709

enter code here
paddy
  • 123
  • 1
  • 2
  • 13
  • Is a bar plot the right type of representation for this? Or would a line plot be more appropriate? There's also [`seaborn.regplot`](https://seaborn.pydata.org/generated/seaborn.regplot.html), but it sounds like your data is not linear at all. – Arya McCarthy Feb 17 '18 at 14:24
  • @AryaMcCarthy yes, I agree line plot would be better but I am trying to have the bar graphs and a line showing the trend of the data on the same graph. I hope it makes it clear what I'm trying to achieve. I have edited the data above a bit. I just randomly got these numbers. The numbers are not necessarily representative of my data. However, the variables are – paddy Feb 17 '18 at 14:33

1 Answers1

7

UPDATE2: using updated data set

In [250]: lr = Ridge()

In [251]: lr.fit(df[['FinancialYear']], df['Sample Size'])
Out[251]:
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [252]: plt.bar(df['FinancialYear'], df['Sample Size'])
Out[252]: <Container object of 11 artists>

In [253]: plt.plot(df['FinancialYear'], lr.coef_*df['FinancialYear']+lr.intercept_, color='orange')
Out[253]: [<matplotlib.lines.Line2D at 0x171def60>]

Result:

enter image description here


UPDATE:

In [184]: from sklearn.linear_model import Ridge

In [185]: lr = Ridge()

In [186]: lr.fit(df[['Year']], df['Sample Size'])
Out[186]:
Ridge(alpha=1.0, copy_X=True, fit_intercept=True, max_iter=None,
   normalize=False, random_state=None, solver='auto', tol=0.001)

In [187]: plt.bar(df['Year'], df['Sample Size'])
Out[187]: <Container object of 5 artists>

In [188]: plt.plot(df['Year'], lr.coef_*df['Year']+lr.intercept_, color='orange')
Out[188]: [<matplotlib.lines.Line2D at 0x17062898>]

Result:

enter image description here


Try to use matplotlib methods for that:

plt.bar(df['Year'], df['Sample Size'])
plt.plot(df['Year'], df['Sample Size'], '-o', color='orange')

Result:

enter image description here

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419