Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values).

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

931 questions
7
votes
1 answer

Pyspark Dataframe Imputations -- Replace Unknown & Missing Values with Column Mean based on specified condition

Given a Spark dataframe, I would like to compute a column mean based on the non-missing and non-unknown values for that column. I would then like to take this mean and use it to replace the column's missing & unknown values. For example, assuming…
midnightfalcon
  • 103
  • 1
  • 1
  • 5
6
votes
4 answers

Replace NAs with missing values in sequence (R)

I have a DF like Now I want to replace The Col B = NA with 15 since that is the missing value. Col C first NA with 14 and second NA with 15. Col D first NA with 13, second NA with 14 and third NA with 15. So the numbers follow a sequence up to down…
6
votes
2 answers

Is there a way to impute missing values in machine learning?

For personal knowledge, I've been trying out different imputation methods other than the mean/median/mode. I was able to try out KNN, MICE, median imputational methods so far. I was told that imputation by clustering method can also be done and my…
uharsha33
  • 225
  • 2
  • 12
6
votes
1 answer

Get p-values from results of svyglm when using multiple imputations in R

I would like to get p-values from the results of a svyglm model when using multiple imputations. A reproducible example is below. Create data sets library(tibble) library(survey) library(mitools) # Data set 1 # Note that I am excluding the "income"…
scottsmith
  • 371
  • 2
  • 11
6
votes
5 answers

Impute missing values with ROLLING mean in R

I am new to R and struggling with a problem. I need a function to impute the missing values in a vector according to the mean value of the elements within a window of a given size. However, this window will move because, say my NA is in position 30,…
s1368647
  • 61
  • 1
  • 3
6
votes
4 answers

Python - SkLearn Imputer usage

I have the following question: I have a pandas dataframe, in which missing values are marked by the string na. I want to run an Imputer on it to replace the missing values with the mean in the column. According to the sklearn documentation, the…
lte__
  • 7,175
  • 25
  • 74
  • 131
6
votes
2 answers

Imputation using mice with clustered data

So I am using the mice package to impute missing data. I'm new to imputation so I've got to a point but have run into a steep learning curve. To give a toy example: library(mice) # Using nhanes dataset as example df1 <- mice(nhanes, m=10) So as you…
user2498193
  • 1,072
  • 2
  • 13
  • 32
6
votes
1 answer

Plot Multiple Imputation Results

I have successfully completed a multiple imputation on the missing data of my questionnaire research using the MICE package in R and performed a linear regression on the pooled imputed variables. I can't seem to work out how to extract single pooled…
Frank Zafka
  • 829
  • 9
  • 30
5
votes
0 answers

Multilevel Multiple Imputation (MICE) with categorical/factor variable?

I have a dataset where I am trying to use multiple imputation with the packages mice, miceadds and micemd for a categorical/factor variable in a multilevel setting. I am able to use the method 2l.2stage.pois for a continuous variable, which works…
Marco Pastor Mayo
  • 803
  • 11
  • 25
5
votes
1 answer

Fill nan with zero python pandas

this is my code: for col in df: if col.startswith('event'): df[col].fillna(0, inplace=True) df[col] = df[col].map(lambda x: re.sub("\D","",str(x))) I have 0 to 10 event column "event_0, event_1,..." When I fill nan with this…
NilZ
  • 71
  • 1
  • 1
  • 4
5
votes
2 answers

What exactly does complete in mice do?

I am researching how to use multiple imputation results. The following is my understanding, and please let me know if there're mistakes. Suppose you have a data set with missing values, and you want to conduct a regression analysis. You may perform…
RyanKao
  • 321
  • 1
  • 5
  • 14
5
votes
3 answers

mice package in R, mipo object does not return variance covariance matrix anymore after updating to mice 3.0

My code stopped working after updating the mice (Multiple Equations by Chained Equations) package to version >3. I wish to retrieve the estimated variance-covariance matrix from linear regressions on multiply imputed datasets. This quantity (which…
user3679030
  • 153
  • 1
  • 6
5
votes
3 answers

Scikit-learn - Impute values in a specific column

Is it possible to impute values for a specific column? For example, if I have 3 columns: A (categorical): does not contain any missing values B (numeric): does not contain any missing values C: suppose this column contains numerics data and some of…
Glorian
  • 127
  • 1
  • 1
  • 10
5
votes
1 answer

scikit-learn impute mean of feature within groups of nominal value in another feature

I want to impute the mean of a feature but only calculate the mean based off other examples that have the same category/nominal value in another column and I was wondering if this was possible using scikit-learn's Imputer class? It would just make…
5
votes
2 answers

Testing for missing values in R

I have a time series data set which has some missing values in it. I wish to impute the missing values but I am unsure as to which method is most appropriate e.g linear, spline or stine from the imputeTS package. For the sake of completeness I wish…
TheGoat
  • 2,587
  • 3
  • 25
  • 58
1 2
3
62 63