0

I want to do a linear regression.

My features are something like this:

Marketcap       EBIT Margin   Price to Book Ratio   EPS Growth

5.589918e+08    23.05            8.71                 7.16
5.572475e+08    65.00            9.68              - 18.44
8.639290e+09     7.8            12.74              - 55.00

I do have to scale the features when doing linear regression, especially when they have such a different scale like Marketcap and the other features, right?

Whats with the negative values of EPS Growth? Whats the best way to perform a feature scaling in this example?

Russgo
  • 104
  • 6
  • 1
    have you tried looking into some standard way of scaling? for e.g. : [strandardscaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html#sklearn.preprocessing.StandardScaler) – jkhadka Nov 19 '20 at 09:54
  • I tried ```preprocessing.StandardScaler``` but when i try to fit the linear regression i get an error: ValueError: Expected 2D array, got scalar array instead: – Russgo Nov 19 '20 at 10:10
  • It is hard to guess what went wrong without looking at the implementation. Can you post your code and some sample data along with your question? – jkhadka Nov 19 '20 at 10:12
  • It worked, i used ```preprocessing.scale```instead. – Russgo Nov 19 '20 at 10:33

1 Answers1

0

From the docs:

Standardize features by removing the mean and scaling to unit variance

This means, given an input x, transform it to (x-mean)/std (where all dimensions and operations are well defined).

So even if your input values are all positive, removing the mean can make some of them negative:

>>> x = np.array([3,5,7])
>>> np.mean(x)
5.0
>>> x - np.mean(x)
array([-2.,  0.,  2.])

More details:

http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf (sec. 4.3) http://scikit-learn.org/stable/modules/preprocessing.html#standardization-or-mean-removal-and-variance-scaling http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-16.html

Ehtisham Ahmed
  • 387
  • 3
  • 15