Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

931 questions

votes

2 answers

Pandas: How to fill null values with mean of a groupby?

I have a dataset will some missing data that looks like this: id category value 1 A NaN 2 B NaN 3 A 10.5 4 C NaN 5 A 2.0 6 B 1.0 I need to fill in the…

asked Oct 28 '16 at 06:12

sfactor

12,592
32
102
152

votes

2 answers

Impute missing data with mean by group

I have a categorical variable with three levels (A, B, and C). I also have a continuous variable with some missing values on it. I would like to replace the NA values with the mean of its group. This is, missing observations from group A has to be…

r loops missing-data imputation

asked Mar 25 '19 at 20:03

JOG

votes

4 answers

Imputer on some Dataframe columns in Python

I am learning how to use Imputer on Python. This is my code: df=pd.DataFrame([["XXL", 8, "black", "class 1", 22], ["L", np.nan, "gray", "class 2", 20], ["XL", 10, "blue", "class 2", 19], ["M", np.nan, "orange", "class 1", 17], ["M", 11, "green",…

python scikit-learn missing-data imputation

asked Jul 26 '16 at 07:59

Mauro Gentile

1,463
6
26
37

votes

4 answers

R: replace NA with item from vector

I am trying to replace some missing values in my data with the average values from a similar group. My data looks like this: X Y 1 x y 2 x y 3 NA y 4 x y And I want it to look like this: X Y 1 x y 2 x y 3 y y 4 x …

r replace missing-data imputation

asked Jul 13 '11 at 19:47

gregmacfarlane

2,121
3
24
53

votes

1 answer

Multiple Imputation of missing and censored data in R

I have a dataset with both missing-at-random (MAR) and censored data. The variables are correlated and I am trying to impute the missing data conditionally so that I can estimate the distribution parameters for a correlated multivariate normal…

r missing-data imputation

asked May 07 '17 at 03:02

chelsea

votes

2 answers

Implementing KNN imputation on categorical variables in an sklearn pipeline

I am implementing a pre-processing pipeline using sklearn's pipeline transformers. My pipeline includes sklearn's KNNImputer estimator that I want to use to impute categorical features in my dataset. (My question is similar to this thread but it…

python encoding scikit-learn pipeline imputation

asked Nov 18 '20 at 20:24

LazyEval

votes

0 answers

Use of statsmodels.imputation.mice

I am exploring statsmodels.imputation.mice package to use for imputing missing values. I haven't seen any example of its usage, though, outside of http://www.statsmodels.org. From what I gather, one would create an instance of mice.MICEData and use…

statsmodels imputation

asked Sep 13 '17 at 22:45

David Makovoz

1,766
2
16
27

votes

1 answer

Using imputed datasets from library mice() to fit a multi-level model in R

I'm new to package mice in R. But I'm trying to impute 5 datasets from popmis and then fit an lmer() model with() each and finally pool() across them. I think the pool() function in mice() doesn't work with the lmer() call from lme4 package,…

r missing-data lme4 imputation r-mice

asked Nov 08 '20 at 06:36

rnorouzian

7,397
5
27
72

votes

4 answers

MCAR Little's test in Python

How can I execute Little's Test, to find MCAR in Python? I have looked at the R package for the same test, but I want to do it in Python. Is there an alternate approach to test MCAR?

python-3.x statistics missing-data imputation hypothesis-test

asked Sep 28 '19 at 08:44

Saurabh Verma

votes

3 answers

Implementation of sklearn.impute.IterativeImputer

Consider data which contains some nan below: Column-1 Column-2 Column-3 Column-4 Column-5 0 NaN 15.0 63.0 8.0 40.0 1 60.0 51.0 NaN 54.0 31.0 2 15.0 17.0 55.0 80.0 NaN 3 54.0 43.0 70.0 16.0 …

python dataframe scikit-learn missing-data imputation

asked Jul 22 '19 at 21:52

k.ko3n

votes

1 answer

Differences between sklearn's SimpleImputer and Imputer

In python's sklearn library there exist two classes, which are doing approximately the same things: sklearn.preprocessing.Imputer and sklearn.impute.SimpleImputer The only difference that I found is a "constant" strategy type in SimpeImputer. Is…

python machine-learning scikit-learn imputation

asked Dec 24 '18 at 11:15

MefAldemisov

votes

1 answer

Do imputation in R when mice returns error that "system is computationally singular"

I am trying to do imputation to a medium size dataframe (~100,000 rows) where 5 columns out of 30 have NAs (a large proportion, around 60%). I tried mice with the following code: library(mice) data_3 = complete(mice(data_2)) After the first…

r imputation r-mice

asked Jan 20 '18 at 10:55

user8270077

4,621
17
75
140

votes

3 answers

Generate larger synthetic dataset based on a smaller dataset in Python

I have a dataset with 21000 rows (data samples) and 102 columns (features). I would like to have a larger synthetic dataset generated based on the current dataset, say with 100000 rows, so I can use it for machine learning purposes thereby. I've…

python machine-learning scikit-learn imputation

asked Mar 06 '19 at 16:04

JChat

votes

3 answers

Can I use Train AND Test data for Imputation?

Interestingly, I see a lot of different answers about this both on stackoverflow and other sites: While working on my training data set, I imputed missing values of a certain column using a decision tree model. So here's my question. Is it fair to…

python-2.7 data-science imputation

asked Oct 14 '17 at 20:28

Analysa

votes

3 answers

Error in "missforest" in R

Need help to get around the below error while performing data imputation in R using "missforest" package. > imputed<- missForest(dummy, maxiter = 10, ntree = 100, variablewise = TRUE, + decreasing = TRUE, verbose = TRUE, + …

r imputation

asked Sep 08 '17 at 22:33

Sandeep

Prev 1

…

62 63 Next