0

I'm trying to fit a column in pandas dataframe with sklearn LinearRegression(). I follow the example here: Linear Regression on Pandas DataFrame using Sklearn ( IndexError: tuple index out of range) but I got value error for array sizes. This is how my dataframe looks like

enter image description here

I do

from sklearn import datasets, linear_model

regr = linear_model.LinearRegression()
regr.fit(x,y)

# plot it as in the example at http://scikit-learn.org/
plt.scatter(x , y ,  color='black')
plt.plot(x, regr.predict(x) , color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()

where I said

y=df['Share of youth not in education, employment or training, total (% of youth population)']
x=df['years']
x,y

but I got this error

ValueError: Expected 2D array, got 1D array instead:
array=[1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973
 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987
 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
 2016 2017 2018 2019 2020 2021].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Then I tried using 2D arrays

x=[[2008], [2009], [2010], [2011], [2012], [2013], [2014], [2015], [2016], [2017], [2018], [2019], [2020]]
y=[[29.6599998474121], [29.8999996185303], [33.0800018310547], [32.0999984741211], [31.5499992370605], [28.3999996185303], [28.00500011444095], [27.6100006103516], [27.5699996948242], [26.8700008392334], [27.0499992370605], [27.0499992370605], [27.0499992370605]]

but it plots something meaningless like this

enter image description here

Where do I wrong? Please help. Also: I added years column because I couldn't use years in the index column. If is there a way to avoid adding years manually by just using the left most indices, please comment about it too.

The data that I care is for 2008-2020

    Share of youth not in education, employment or training, total (% of youth population)
2008    29.6599998474121
2009    29.8999996185303
2010    33.0800018310547
2011    32.0999984741211
2012    31.5499992370605
2013    28.3999996185303
2014    28.00500011444095
2015    27.6100006103516
2016    27.5699996948242
2017    26.8700008392334
2018    27.0499992370605
2019    27.0499992370605
2020    27.0499992370605
mrq
  • 274
  • 1
  • 2
  • 14

0 Answers0