Questions tagged [standardized]

Shifting and rescaling data to assure zero mean and unit variance.

Overview

Specifically, when xi, i =1,..., n is a batch of data, its mean is:

m=∑xi/n

and its variance is:

s2 = ∑(xi−m)2)/ν

where,

v is either n or n-1 (choices vary with application).

Standardization replaces each xi with zi = (xi-m)/s. Do not confuse standardization with normalization.


Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

145 questions
3
votes
1 answer

Why after standardization new values are greater than 1 and -1?

I want to normalize data to zero mean and 1 standard deviation But my final result still has values greater than 1 and -1,Why? E2 = np.array([-2.51212507515, -2.19475817821, -1.46734920106, -1.21180880012, -1.00548224796, -0.659646985536,…
jerry
  • 385
  • 6
  • 18
3
votes
3 answers

Create r function that standardizes multiple variables and creates new columns

I have a dataset in which I work with mean-centered and standardized versions of many of the variables. In my r code I have a large list of the scale() functions that I run for all of the variables but I am wondering if there is a way to write a…
Addison
  • 143
  • 7
3
votes
1 answer

When using Standardize in H2O on New Data

I am curious to know that when using the Standardised feature in a H2O model in R how does it work when scoring out new data. I know that when it standardises on a training set is sets the mean to 0 and standard deviation to 1 based on the mean and…
3
votes
2 answers

Including standardized coefficients in a stargazer table

I have a series of linear models and I'd like to report the standardized coefficients for each. However, when I print the models in stargazer, it looks like stargazer automatically prints the significance stars for the standardized coefficients as…
spindoctor
  • 1,719
  • 1
  • 18
  • 42
3
votes
2 answers

Neural networks - Do training set and validation set need separate standardization?

I have this 5-5-2 backpropagation neural network I'm training, and after reading this awesome article by LeCun I started to put in practice some of the ideas he suggests. Currently I'm evaluating it with a 10-fold cross-validation algorithm I made…
mp85
  • 422
  • 3
  • 17
3
votes
2 answers

How to scale a variable by group

I would really appreciate your help in this question. I have the following dataset and I would like to create a new variable which would contain the standardized values (z distribution) per level of a given factor variable. x <- data.frame(gender =…
Pulse
  • 867
  • 5
  • 12
  • 19
2
votes
1 answer

How to avoid scaling dummy variables in dataframe in r?

I want to standardise all my variables before applying machine learning methods. However, to my understanding, dummy variables should never be standardised. After entering the following code, r standardized all my variables, even the ones which are…
TFT
  • 129
  • 10
2
votes
1 answer

Standardize or Normalize Categorical values

I am fairly new to data science (I'm using python) and found that it's better for us to standardize or normalize our data before we go further. My questions are : What if there are categorical values (binary and using one hot encoding, 0 or 1)…
2
votes
1 answer

R function for normalization based on one column?

Is it possible to normalize this table in R based on the last column(samples) samples = number of sequenced genomes. So I want to get a normalised distribution of all the genes in all the conditions. Simplified example of my data: I tried: dat1 <-…
Xela Vi
  • 113
  • 7
2
votes
2 answers

scale columns based on vector of column names

set.seed(123) dat <- data.frame(year_ref = 2000:2004, www_val1 = sample(5), www_val2 = sample(5), www_val3 = sample(5), sat_val1 = sample(5), sat_val2 = sample(5), …
89_Simple
  • 3,393
  • 3
  • 39
  • 94
2
votes
1 answer

StandardScaler giving non-uniform standard deviation

My problem setup is as follows: Python 3.7, Pandas version 1.0.3, and sklearn version 0.22.1. I am applying a StandardScaler (to every column of a float matrix) per usual. However, the columns that I get out do not have standard deviation =1, while…
Zhubarb
  • 11,432
  • 18
  • 75
  • 114
2
votes
1 answer

zero-inflated overdispersed count data glmmTMB error in R

I am working with count data (available here) that are zero-inflated and overdispersed and has random effects. The package best suited to work with this sort of data is the glmmTMB (details here and troubleshooting here). Before working with the…
Blundering Ecologist
  • 1,199
  • 2
  • 14
  • 38
2
votes
1 answer

Z-standardization makes PC1 and PC2 exactly the same in this PCA analysis: Why?

I am trying to perform a PCA analysis using the psych package in R. I got two variables that I want to combine into one component displaying standard of living: slvpen: Standard of living of pensioners: 0 = Extremely bad, 10 = Extremely…
SnupSnurre
  • 363
  • 2
  • 12
2
votes
1 answer

Does this scale each column individually? R

If I wanted to standardize columns 2 and 3 (each column standardized separately), would this work? df[c(2:3)] <- scale(df[c(2:3)])
2
votes
1 answer

Shouldn't H2O standardize categorical predictors for regularized GLM models (lasso, ridge, elastic net)?

"The lasso method requires initial standardization of the regressors, so that the penalization scheme is fair to all regressors. For categorical regressors, one codes the regressor with dummy variables and then standardizes the dummy variables" (p.…
Elliot
  • 21
  • 3
1
2
3
9 10