I'm trying to fit a column in pandas dataframe with sklearn LinearRegression(). I follow the example here: Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range) but I got value error for array sizes. This is how my dataframe looks like
I do
from sklearn import datasets, linear_model
regr = linear_model.LinearRegression()
regr.fit(x,y)
# plot it as in the example at http://scikit-learn.org/
plt.scatter(x , y , color='black')
plt.plot(x, regr.predict(x) , color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
where I said
y=df['Share of youth not in education, employment or training, total (% of youth population)']
x=df['years']
x,y
but I got this error
ValueError: Expected 2D array, got 1D array instead:
array=[1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973
1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987
1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
2016 2017 2018 2019 2020 2021].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Then I tried using 2D arrays
x=[[2008], [2009], [2010], [2011], [2012], [2013], [2014], [2015], [2016], [2017], [2018], [2019], [2020]]
y=[[29.6599998474121], [29.8999996185303], [33.0800018310547], [32.0999984741211], [31.5499992370605], [28.3999996185303], [28.00500011444095], [27.6100006103516], [27.5699996948242], [26.8700008392334], [27.0499992370605], [27.0499992370605], [27.0499992370605]]
but it plots something meaningless like this
Where do I wrong? Please help. Also: I added years column because I couldn't use years in the index column. If is there a way to avoid adding years manually by just using the left most indices, please comment about it too.
The data that I care is for 2008-2020
Share of youth not in education, employment or training, total (% of youth population)
2008 29.6599998474121
2009 29.8999996185303
2010 33.0800018310547
2011 32.0999984741211
2012 31.5499992370605
2013 28.3999996185303
2014 28.00500011444095
2015 27.6100006103516
2016 27.5699996948242
2017 26.8700008392334
2018 27.0499992370605
2019 27.0499992370605
2020 27.0499992370605