Questions tagged [standardized]

Shifting and rescaling data to assure zero mean and unit variance.

Overview

Specifically, when xi, i =1,..., n is a batch of data, its mean is:

m=∑xi/n

and its variance is:

s2 = ∑(xi−m)2)/ν

where,

v is either n or n-1 (choices vary with application).

Standardization replaces each xi with zi = (xi-m)/s. Do not confuse standardization with normalization.


Tag usage

Questions on tag should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

145 questions
1
vote
3 answers

Clear and Concise Way to apply Standardization to both Train and Test Set in R

I am selecting a 90/10 Training/Test split with some data in R. After I have the Training set. I would like to standardize it. I would then like to use the same mean and standard deviation used in the training set and apply that standardization to…
Coldchain9
  • 1,373
  • 11
  • 31
1
vote
2 answers

Include scale.=F as argument for preProcess within caret train?

I am working on a classification problem. Within my data processing, I estimate the best transformation to normality using bestNormalize(). During this step, I standardize all predictors. I use PCA as a preprocessing step to decorrelate my data…
Kevin
  • 61
  • 4
1
vote
1 answer

Place the mean of the column equal to 100 and transform other values in the column proportionally (Pandas Python)

I have a pandas dataframe like this: City Variable1 c1 1234 c2 2222 c3 1111 c4 2224 I would like to apply a form of standardization where: the mean of the column is placed equal to 100. the values are…
coelidonum
  • 523
  • 5
  • 17
1
vote
1 answer

Standardized regression coefficients with dummy variables in R vs. SPSS

I came across a puzzling difference in standardized (beta) coefficients with linear regression model computed with R and SPSS using dummy coded variables. I have used the hsb2 data set and created a contrast (dummy coding), so that the third…
panman
  • 1,179
  • 1
  • 13
  • 33
1
vote
1 answer

Should i Standadize and detrend before train\test split?

I'm new to python and trying to perform a random forest regression task. I import my dataset that has 5 columns in total (including date column). My data is time dependant so i cannot use the train/test split. So instead i do the following…
1
vote
1 answer

scale variables in dataframe using another dataframe

I have a dataframe with following variables dat <- data.frame(cell.ID = 1:10, cell.name = letters[1:10], groupID = rep(1:2, each = 5), x1 = rnorm(10), x2 = rnorm(10), x3 = rnorm(10), x4= rnorm(10), …
89_Simple
  • 3,393
  • 3
  • 39
  • 94
1
vote
2 answers

Are lmer-coefficients standardized or not?

I have a very basic question; maybe a bit too basic to find a helpful response googeling it. I am calcultating multi-level-models using the lmer function using this code: lmer(H1_rirs, data= df_long_cl, REML = T) Am I right in assuming that the…
1
vote
1 answer

Standardize Pixel Input Data with many Zeros

I wanna standardize my input data for a neural network. Data looks like this: data= np.array([[0,0,0,0,233,2,0,0,0],[0,0,0,23,50,2,0,0,0],[0,0,0,0,3,20,3,0,0]]) This is the function that I used. It doesn't work because of the zeros. def…
Steven
  • 105
  • 8
1
vote
0 answers

CDF to normalize data

I really need your help. I want to scale my data between 0 and 1 to cluster it afterwards. Does it make sense to use the cummalitive distribution function (CDF) to normalize the data in advance? (My features have different value ranges.) Please with…
piku
  • 21
  • 1
1
vote
0 answers

Which distance measurement fits best for features that have very different value ranges?

I have given record with different features. In total I have 8 features. Some are binary, but some have a value range from 0 to 10 million. My big goal is to cluster the data. At the moment I am still looking for a suitable distance measure for…
piku
  • 21
  • 1
1
vote
0 answers

Standardization and inclusion of intercept in sparse lasso GLM

I found some problems while practicing the sparse group lasso method using the cvSGL function forom the SGL package. My questions are as follows: Looking at the code for SGL:::center_scale, it doesn't seem to consider the sample size of the…
1
vote
2 answers

How to Normalize or standardize specific or selected features of a data set in python

I have data and the name of the data frame is Table, Table contains 15 features and I want to normalize only 3 features that are numeric data, the names of these features are 'rate', 'cost', and 'Total cost'.Please, how do I fix this? I tried to…
Loui
  • 97
  • 2
  • 11
1
vote
1 answer

How to Standardize a Column of Data in R and Get Bell Curve Histogram to fins a percentage that falls within a ranges?

I have a data set and one of columns contains random numbers raging form 300 to 400. I'm trying to find what proportion of this column in between 320 and 350 using R. To my understanding, I need to standardize this data and creates a bell curve…
1
vote
1 answer

Different RMSE when training/testing my polynomial regression before/after standardizing

I am in the process of building a regression model that will eventually be used by other users. This model serves to predict flower temperature by using multiple atmospheric variables such as air temperature, humidity, solar radiation, wind, etc.…
1
vote
1 answer

Reduce the range of time in sequence analysis with R

I have a sequence that happens over a very long period of time. I tried 8 different algorithms to classify my sequences (OM, CHi2,...). Time goes from 1 to 123. I have 110 individual and 8 events. My results are not as expected. First, it's very…