Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

931 questions

votes

0 answers

Scikit-learn mean imputation gives different mean after imputation

I am doing mean imputation for missing values in a large numpy array. >>> import numpy as np >>> from sklearn.preprocessing import Imputer ... >>> X_train_reshaped.shape (6794600, 19) >>> imp = Imputer() >>> X_train_reshaped_imputed =…

python scikit-learn imputation

asked Jun 30 '18 at 01:46

arun

10,685
6
59
81

votes

1 answer

How to impute NaN values to a default value if strategy fails?

Problem I am using the sklearn.preprocessing.Imputer class to impute NaN values using a mean strategy over the columns, i.e. axis=0. My problem is that some data which needs to be imputed only has NaN values in it's column, e.g. when there is only a…

python scikit-learn imputation

asked May 29 '18 at 12:59

Thijs van Ede

votes

2 answers

Stripplot in MICE does not show categorical variables

I´m using the mice package in R to do multiple imputation. I´ve done several imputations with only numerical variables, the imputation method is predictive mean matching, and when I use stripplot(imp) I get to see the observed and imputed values of…

r imputation r-mice

asked May 28 '18 at 11:16

Synnøve Eikefet

votes

3 answers

Simulate data and randomly add missing values to dataframe

How can I randomly add missing values to some or each column (say random ~5% missing in each) in a simulated dataframe, plus, is there a more efficient way of simulating a dataframe with both continuous and factor columns? #Simulate some data N <-…

r simulation missing-data data-manipulation imputation

asked May 25 '18 at 11:59

aelhak

votes

1 answer

Why does fillna with median on dataframe still leaves Na/NaN in pandas?

I've seen this and this thread here, but something else is wrong. I have a very large pandas DataFrame, with many Na/NaN values. I want to replace them with the median value for that feature. So, I first make a table that displays the Na values per…

python pandas dataframe series imputation

asked May 09 '18 at 18:19

GrundleMoof

votes

2 answers

NA in time series handling?

I am dealing with a forecast of time series in R. I have several questions: I would like to ask how we can handle missing values in time series? I guess we can somehow interpolate them? Can you suggest some solution in R for this?

r time-series na imputation

asked Apr 27 '18 at 14:09

syeenn

votes

2 answers

Impute missing values to 0, and create indicator columns in Pandas

I have a very simple dataframe in Pandas, testdf = [{'name' : 'id1', 'W': np.NaN, 'L': 0, 'D':0}, {'name' : 'id2', 'W': 0, 'L': np.NaN, 'D':0}, {'name' : 'id3', 'W': np.NaN, 'L': 10, 'D':0}, {'name' : 'id4', 'W':…

python pandas dataframe imputation

asked Jul 15 '17 at 19:16

Monica Heddneck

2,973
10
55
89

votes

4 answers

Sklearn: Categorical Imputer?

Is there a way to impute categorical values using a sklearn.preprocessing object? I would like to ultimatly create a preprocessing object which I can apply to new data and have it transformed the same way as old data. I am looking for a way to do…

machine-learning tensorflow scikit-learn sklearn-pandas imputation

asked Mar 16 '17 at 23:03

user1367204

4,549
10
49
78

votes

2 answers

How to fill missing values using median imputation in R for all the columns based on a customer id for panel data?

Customer id Year a b 1 2000 10 2 1 2001 5 3 1 2002 NA 4 1 2003 NA 5 2 2000 2 NA 2 2001 NA 4 2 …

r panel median imputation

asked Feb 15 '17 at 19:28

user7570943

votes

2 answers

svd imputation R

I'm trying to use the SVD imputation from the bcv package but all the imputed values are the same (by column). This is the dataset with missing data http://pastebin.com/YS9qaUPs #load data dataMiss = read.csv('dataMiss.csv') #impute…

r svd imputation

asked Feb 27 '16 at 18:57

Sojers

votes

4 answers

Replacing NA's in each column of matrix with the median of that column

I am trying to replace the NA's in each column of a matrix with the median of of that column, however when I try to use lapply or sapply I get an error; the code works when I use a for-loop and when I change one column at a time, what am I doing…

r matrix na median imputation

asked Jan 18 '16 at 23:08

Jonno Bourne

1,931
1
22
45

votes

1 answer

How to find RMSE by using loop in R

If I have a data frame contain 3 variables : origdata <- data.frame( age <- c(22, 45, 50, 80, 55, 45, 60, 24, 18, 15), bmi <- c(22, 24, 26, 27, 28, 30, 27, 25.5, 18, 25), hyp <- c(1, 2, 4, 3, 1, 2, 1, 5, 4, 5) ) I created MCAR…

r statistics missing-data imputation r-mice

asked Dec 22 '15 at 20:14

zhyan

votes

1 answer

How impute NA values or create all possible combinations?

data.frame( group = c("a", "b", "c", "d", "e", "total"), count = c(NA, NA, 10, 21, 49, 85) ) > group count 1 a NA 2 b NA 3 c 10 4 d 21 5 e 49 6 total 85 Given the above data frame, how can I impute the…

r imputation

asked May 06 '23 at 05:33

electronix384128

6,625
11
45
67

votes

1 answer

Error message with missForest package (imputation using Random Forest)

My dataframe is below. All variables are numeric, one of them (Total) has about 20 NAs. I would like the missForest package to create imputed values for the NAs in Total. I am running R version 4.2.1 (2022-06-23 ucrt) on Windows. imp <-…

r random-forest imputation

asked Jan 31 '23 at 21:07

lawyeR

7,488
5
33
63

votes

2 answers

AUC of logistic and ordinal model following multiple imputation using MICE (with R)

I am asking a question concerning the additive predictive benefit of the inclusion of a variable to a logistic and an ordinal model. I am using mice to impute missing covariates and am having difficulty finding ways to calculate the AUC and R…

r imputation auc r-mice

asked Nov 22 '22 at 16:37

DW1310

Prev 1 2 3

…

62 63 Next