Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values).

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

931 questions
3
votes
1 answer

R - Combine two mice mids objects when data frames have different columns

I'm using the mice package on two different but related data frames. While the large majority of the variables are the same for both data frames, a small number of variables are unique to each data frame and the imputation happens for both data…
Rasul89
  • 588
  • 2
  • 5
  • 14
3
votes
2 answers

Creating a loop for multiple dependent variables in logistic regression using a multiple imputation dataset?

This previous question "How to repeatedly perform glm over multiple dependent variables after mice?" - did not work for me. I don't understand how it incorporates the pooling of the mids. ###################### I need to repeat logistic regression…
19056530
  • 109
  • 5
3
votes
1 answer

How to use mice for multiple imputation of missing values in longitudinal data?

I have a dataset with a repeatedly measured continuous outcome and some covariates of different classes, like in the example below. Id y Date Soda Team 1 -0.4521 1999-02-07 Coke Eagles 1 0.2863 1999-04-15 …
3
votes
3 answers

How to fill `missings` in a vector using next valid observation

Is there any specific function for filling missings in an array? MWE: x = [missing,missing,2,3,missing,4] desired = [2,2,2,3,4,4]
3
votes
5 answers

Is there an R function for imputing missing year values, consecutively, by group?

My dataframe looks like: df <- data.frame(ID=c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "D", "D", "D", "D"), grade=c("KG", "01", "02", "03", …
mkpcr
  • 431
  • 1
  • 3
  • 13
3
votes
0 answers

How to get the dataset imputed by rpart using surrogate splits

rpart has the ability to deal with na values by imputing them from surrogate splits. By setting usesurrogate = 2 in rpart.control, na values are dealt with. Is there a way to get the imputed version of the dataset from the rpart object? num <- c(5,…
Mine
  • 831
  • 1
  • 8
  • 27
3
votes
1 answer

SimpleImputer with groupby

Let's suppose the following…
Samir Hinojosa
  • 825
  • 7
  • 24
3
votes
1 answer

How to impute entire missing values in pandas dataframe with mode/mean?

I know codes forfilling seperately by taking each column as below data['Native Country'].fillna(data['Native Country'].mode(), inplace=True) But i am working on a dataset with 50 rows and there are 20 categorical values which need to be imputed. Is…
Antony Joy
  • 301
  • 3
  • 15
3
votes
2 answers

Pandas: Impute a given number of missing values before/after a series of available values

Let's say I have a time series where I usually have data available for a certain continous span of years, but missing values before and after that span, like this: df = pd.DataFrame({'year': ["2000","2001","2002", "2003","2004", "2005","2006",…
Christian O.
  • 468
  • 2
  • 12
3
votes
1 answer

Using tbl_regression with imputed data/pooled regression models

I've had great success using the gtsummary::tbl_regression function to display regression model results. I can't see how to use tbl_regression with pooled regression models from imputed data sets, however, and I'd really like to. I don't have a…
3
votes
3 answers

replacing NA with next available number within a group

I have a relatively large dataset and I want to replace NA value for the price in a specific year and for a specific ID number with an available value in next year within a group for the same ID number. Here is a reproducible example: ID <-…
Ross_you
  • 881
  • 5
  • 22
3
votes
1 answer

Error with bagImpute and predict from caret package

I have the following when error when trying to use the preProcess function from the caret package. The predict function works for knn and median imputation, but gives an error for bagging. How should I edit my call to the predict…
Aveshen Pillay
  • 431
  • 3
  • 13
3
votes
2 answers

Pyspark forward and backward fill within column level

I try to fill missing data in a pyspark dataframe. The pyspark dataframe looks as such: +---------+---------+-------------------+----+ | latitude|longitude| timestamplast|name| +---------+---------+-------------------+----+ | |…
Jeroen
  • 801
  • 6
  • 20
3
votes
2 answers

Handling missing categorical values ML

I have gone through replace missing values in categorical data regarding handling missing values in categorical data. Dataset has about 6 categorical columns with missing values. This would be for a binary classification problem I see different…
3
votes
0 answers

autoImpute throws dimension mismatch error on fit_transform for MultipleImputer

I'm trying to impute the missing values using autoImpute package for the titanic test data set using the python autoimpute package. However, the module is throwing a dimension mismatch error on the test data set. kaggle titanic test data import…
agarg
  • 318
  • 3
  • 11