Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values).

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

931 questions
3
votes
2 answers

Creating imputation list for use with svyglm

Using the survey package, I am having issues creating an imputationList that svydesign will accept. Here is a reproducible example: library(tibble) library(survey) library(mitools) # Data set 1 # Note that I am excluding the "income" variable…
scottsmith
  • 371
  • 2
  • 11
3
votes
1 answer

Prevent Imputer from losing values

Currently I am trying to impute a dependent variable with pandas. (Don't ask why.) This is the dataset y.head(15) Out[138]: 0 13495.0 1 16500.0 2 16500.0 3 13950.0 4 17450.0 5 15250.0 6 17710.0 7 18920.0 8 …
Bestname
  • 173
  • 2
  • 10
3
votes
1 answer

Marginalize over missing discrete response data in Stan

I have some ordinal data with missingness, which I am trying to model in Stan. Since Stan cannot handle discrete parameters directly, I am attempting to marginalize over the different possible values of the response variable for those cases which…
user_15
  • 151
  • 9
3
votes
2 answers

Can I replace Nans with the mode of a column in a grouped data frame?

I have some data that looks like... Year Make Model Trim 2007 Acura TL Base 2010 Dodge Avenger SXT 2009 Dodge Caliber SXT 2008 Dodge Caliber SXT 2008 Dodge Avenger SXT Trim has some missing values. What I would…
Demetri Pananos
  • 6,770
  • 9
  • 42
  • 73
3
votes
1 answer

Calculating predicted means (or predicted probabilities) and SE after multiple imputation in R

I want to calculate predicted values and standard errors, but I can't simply use predict(), as I’m using 15 multiply imputed datasets (Amelia package generated). I run regression models on each dataset. Afterwards, results are combined into a single…
eva_utrecht
  • 95
  • 1
  • 2
  • 6
3
votes
2 answers

knn imputation of categorical variables in python

I am trying to implement kNN from the fancyimpute module on a dataset. I was able to implement the code for continuous variables of the datasets using the code below: knn_impute2=KNN(k=3).complete(train[['LotArea','LotFrontage']]) It yields the…
KINNI
  • 51
  • 1
  • 1
  • 3
3
votes
2 answers

SAS Proc MI SAS output

Proc MI is used to impute missing values in a SAS dataset. Is there a way to obtain a SAS code from Proc MI procedure, so that we can score datasets with missing value without having to use Proc MI procedure? This is needed so that dataset in…
Zenvega
  • 1,974
  • 9
  • 28
  • 45
3
votes
3 answers

How to replace consecutive NAs with zero given a max gap parameter (in R)

I would like to replace all consecutive NA values per row with zero but only if the number of consecutive NAs is less than a parmeter maxgap. This is very similar to the function zoo::na.locf x = c(NA,1,2,3,NA,NA,5,6,7,NA,NA,NA) zoo::na.locf(x, …
Richi W
  • 3,534
  • 4
  • 20
  • 39
3
votes
1 answer

Does fancyimpute's SoftImpute require normalized data?

The page https://pypi.python.org/pypi/fancyimpute has the line # Instead of solving the nuclear norm objective directly, instead # induce sparsity using singular value thresholding X_filled_softimpute =…
Make42
  • 12,236
  • 24
  • 79
  • 155
3
votes
1 answer

R - Getting Imputed Missing Values back into dataframe

I'm using aregImpute to impute missing values on a R dataframe (bn_df). The code is this: library(Hmisc) impute_arg <- aregImpute(~ TI_Perc + AS_Perc + CD_Perc + CA_Perc + FP_Perc, data = bn_df,…
BrunoPT
  • 37
  • 1
  • 5
3
votes
1 answer

how to impute a column in pandas dataframe within each group

All, I have dataframe with four columns ('key1', 'key2', 'data1', 'data2'). I inserted some nan into data1. Now I want to fill the nan with values that is the most occuring value within each group after I do groupby(['key1', 'key2']). dt = …
zesla
  • 11,155
  • 16
  • 82
  • 147
3
votes
1 answer

Pandas per group imputation of missing values

How can I achieve such a per-country imputation for each indicator in pandas? I want to impute the missing values per group no-A-state should get np.min per indicatorKPI no-ISO-state should get the np.mean per indicatorKPI for states with missing…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
3
votes
2 answers

pandas fill N.A. for specific column

I want to fill N.A. values in a specific column if a condition is met in another column to only replace this single class of N.A. values with an imputed / replacement value. E.g. I want to perform: if column1 = 'value1' AND column2 = N.A…
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292
3
votes
2 answers

Is Last Observation Carried Forward (LOCF) implemented in PostgreSQL?

Is the data imputation method Last Observation Carried Forward (LOCF) implemented in PostgreSQL? If not, how could I implement this method?
Hello lad
  • 17,344
  • 46
  • 127
  • 200
3
votes
4 answers

mean-before-after imputation in R

I'm new in R. My question is how to impute missing value using mean of before and after of the missing data point? example; using the mean from the upper and lower of each NA as the impute value. -mean for row number 3 is 38.5 -mean for row number 7…
NoraNorad
  • 27
  • 5