shortening runtime for sklearn Logistic Regression

Question

I currently have a Flask server that takes in data and applies a Logistic Regression algo to it. However, I plan on turning it into a AWS Lambda function and I would like the algo to be as efficient as possible time wise.

The input is something like this:

The algo part of the code is a few lines:

from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)
y_predict_test = classifier.predict(X)
oldlist = classifier.predict_proba(X)
problist = sortProb(oldlist)
return(problist)

This currently takes about 2.6 seconds to process. Is there anyway to speed it up?

Thanks

Does this answer your question? [Speeding up sklearn logistic regression](https://stackoverflow.com/questions/20894671/speeding-up-sklearn-logistic-regression) — Sergey Bushmanov, Oct 15 '20 at 19:10

score 0 · Answer 1 · answered Oct 15 '20 at 19:15

This is not how lambda is generally used in the machine learning pipeline. Usually, you would use lambda to do some data formatting and pass the result to a SageMaker endpoint that is hosting a trained model. But for some very small model, I guess that you can try lambda as your backend.

First, whether you are using lambda or any backend, you most likely don't want to train the model each time your endpoint is invoked, you just want to do the inference (unless we are specifically talking about online learning).

So, train your model offline as you would normally and then use that trained model for inference. Logistic regression is quite a simple algorithm (at least the inference part of it), therefore you can just extract the relevant parameters and "hard-code" them into the lambda function with appropriate inference logic.

Here is how you can extract the model's coefficients and intercept.

# This is the offline part
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, y_train)

coef, intercept = clf.coef_, clf.intercept_

And use these in your lambda function. Here is how you might implement it (I am using numpy here, but feel free to implement it however you like). Also, I am omitting the lambda's boilerplate code.

# This goes into your lambda function
coef, intercept = # hardcoded parameters here

def sigmoid(X):
    return 1 / (1 + np.exp(-X))

def predict(X, intercept, coef):
    return sigmoid(np.dot(X, coef.T) + intercept)

# compute prediction
predict(X[0], intercept, coef)

i don't quite understand how you can train offline in lambda? I want to be able to pass a dataframe into the LogReg algo and get a list of probs. Not sure how offline works with that. If you don't mind, can I get your reddit username so I can take this convo to a chat? — Juliette, Oct 17 '20 at 15:15
What I mean by training offline is that you perform the training step without using lambda (locally on your computer or some virtual server that you have access to) and once that is done, you just extract the parameters which your then hardcode into your lambda function. That assumes that your endpoint is responsible only for predictions, not for the actual training. If you plan to do training and predictions at the same time then I don't think that lambda is suitable for such a task and you will most likely need to design a different strategy. — Matus Dubrava, Oct 17 '20 at 16:21

shortening runtime for sklearn Logistic Regression

1 Answers1