13

First there are questions on this forum very similar to this one but trust me none matches so no duplicating please.

I have encountered two methods of linear regression using scikit's sklearn and I am failing to understand the difference between the two, especially where in first code there's a method train_test_split() called while in the other one directly fit method is called.

I am studying with multiple resources and this single issue is very confusing to me.

First which uses SVR

X = np.array(df.drop(['label'], 1))

X = preprocessing.scale(X)

y = np.array(df['label'])

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.2)

clf = svm.SVR(kernel='linear')

clf.fit(X_train, y_train)

confidence = clf.score(X_test, y_test)

And second is this one

# Split the data into training/testing sets
diabetes_X_train = diabetes_X[:-20]
diabetes_X_test = diabetes_X[-20:]

# Split the targets into training/testing sets
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the training sets
regr.fit(diabetes_X_train, diabetes_y_train)

# Make predictions using the testing set
diabetes_y_pred = regr.predict(diabetes_X_test)

So my main focus is the difference between using svr(kernel="linear") and using LinearRegression()

Dev_Man
  • 847
  • 1
  • 10
  • 28
  • 2
    I would suggest you use a kaggle dataset and run both of these. Change the number of rows for training each time by significant numbers. You'll see the difference in speed as well. Many other parameters will differ i believe. – shruti iyyer Nov 01 '17 at 11:35

3 Answers3

6

cross_validation.train_test_split : Splits arrays or matrices into random train and test subsets.

In second code, splitting is not random.

svm.SVR: The Support Vector Regression (SVR) uses the same principles as the SVM for classification, with only a few minor differences. First of all, because output is a real number it becomes very difficult to predict the information at hand, which has infinite possibilities. In the case of regression, a margin of tolerance (epsilon) is set in approximation to the SVM which would have already requested from the problem. But besides this fact, there is also a more complicated reason, the algorithm is more complicated therefore to be taken in consideration. However, the main idea is always the same: to minimize error, individualizing the hyperplane which maximizes the margin, keeping in mind that part of the error is tolerated.

Linear Regression: In statistics, linear regression is a linear approach for modeling the relationship between a scalar dependent variable y and one or more explanatory variables (or independent variables) denoted X. The case of one explanatory variable is called simple linear regression.

Reference: https://cs.adelaide.edu.au/~chhshen/teaching/ML_SVR.pdf

Tushar Gupta
  • 1,603
  • 13
  • 20
  • 1
    Thanks for explaining in depth but I still have a doubt about the difference between the svr(kernel=linear) and LinearRegression. Can u help understand that? – Dev_Man Oct 27 '17 at 13:51
  • Are u trying to say that the LinearRegression() way works only when there's only one independent variable that the y vaue depends on? correct me if I am wrong pls – Dev_Man Oct 27 '17 at 13:58
  • difference apart from the definition? – shruti iyyer Nov 01 '17 at 11:38
3

This is what I found:

Intuitively, as all regressors it tries to fit a line to data by minimising a cost function. However, the interesting part about SVR is that you can deploy a non-linear kernel. In this case you end making non-linear regression, i.e. fitting a curve rather than a line. This process is based on the kernel trick and the representation of the solution/model in the dual rather than in the primal. That is, the model is represented as combinations of the training points rather than a function of the features and some weights. At the same time the basic algorithm remains the same: the only real change in the process of going non-linear is the kernel function, which changes from a simple inner product to some non linear function.

So SVR allows non linear fitting problems as well while LinearRegression() is only for simple linear regression with straight line (may contain any number of features in both cases).

Dev_Man
  • 847
  • 1
  • 10
  • 28
  • 2
    @VivekKumar linearRegression() is only for fitting straight lines but svm with a linear kernel can fit curves as well. That is what I have asked, the difference and not any other thing – Dev_Man Oct 31 '17 at 13:57
  • 3
    So, you are saying, SVM can fit curves even with a linear kernel? – Vivek Kumar Oct 31 '17 at 14:23
  • The answer I have quoted from the docs points out that kernel facility gives options to fit curves as well while LinearRegression() is very specific to the straight line fit. And apart from that its a common fact that svms are good for dataset that are not high in number of rows. i Appreciate the brainstorming. This is the thing about stackoverflow thats really appreciable. – Dev_Man Nov 01 '17 at 06:22
  • 1
    @Dev_Man: the quote in your answer is saying that SVR is a more general method than linear regression as it allows non-linear kernels, however in your original question you ask speciffically about SVR with linear kernel and this qoute does not explain definitely if the case with linear kernel is equivalent to the linear regression. – Wassermann Dec 30 '19 at 00:49
  • @Wassermann Right, I think I need to check whether SVR (linear kernel) and linear reg calculate the line in the same way, giving the same line equation, I doubt the SVR gives a line equation like linear reg does. – Dev_Man Dec 31 '19 at 06:22
0

The main difference for these methods is in mathematics background!

We have samples X and want to predict target Y.

The Linear Regression method just minimizes the least squares error:

for one object target y = x^T * w, where w is model's weights.

Loss(w) = Sum_1_N(x_n^T * w - y_n) ^ 2 --> min(w)

As it is a convex functional the global minimum will be always found. After taking derivative of Loss by w and transforming sums to vectors you'll get:

w = (X^T * X)^(-1)* (X^T * Y)

So, in ML (i'm sure sklearn also has the same implementation) the w is calculated according above formula. X is train samples, when you call fit method. In predict this weights just multiplies on X_test. So the decision is explicit and faster (except for Big selections as finding inverse matrix in this cases is complicated task) than converging methods such as svm.

In addition: Lasso and Ridge solves the same task but have additionally the regularization on weights in their losses. And you can calculate the weights explicit in that cases too.

The SVM.Linear does almost the same thing except it has an optimization task for maximizing the margin (i apologize but it is difficult to put it down because i didn't find out how to write in Tex format here). So it uses gradient descent methods for finding global extremum. Sklearn's class SVM even have attribute max_iter which is used in the converging tasks.

To sum up: Linear Regression has explicit decision and SVM finds approximate of real decision because of numerical(computational) solution.

Alex
  • 316
  • 2
  • 5