Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values).

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

931 questions
-1
votes
1 answer

Need an interpretation on a statistical expression on missing values

I was reading a paper about missing values on the Internet and having a problem in interpreting interpreting the meaning of the first sentence highlighted in bold below: Missing data present various problems. First, the absence of data reduces…
-1
votes
1 answer

replicate the result of `mice()` using library(Hmisc) in R

Below, I've used library(mice) to multiply impute 5 datasets from my data.frame popmis. Then, I performed my desired analysis with() all those 5 imputed datasets and finally pool() across those analyses. Question: Is it possible to replicate the…
rnorouzian
  • 7,397
  • 5
  • 27
  • 72
-1
votes
1 answer

Pooling frequencies after multiple imputation in SAS

I have imputed data using multiple imputation using PROC MI in SAS, generating n imputed datasets. Now, I would like to report a baseline table with imputed values. However, I cannot find the right SAS code to do so. I've used PROC FREQ using a BY _…
Lisa
  • 1
-1
votes
1 answer

How to Use SimpleImputer on a 1D Array?

I've got a dataset with multiple NaN values in my Dependent Variable column. I've split the set into Dependent and Independent Variables, and I'm currently trying to replace all NaN values in my Dependent Variable column with 0's. However, I'm…
-1
votes
1 answer

sklearn impute rows satisfying condition

I'm trying to use sklearn SimpleImputer to impute missing ages from a particular column in a pandas DataFrame containing Titanic data. However, I want to separately impute the missing values for passengers whose names contain the word "Master" using…
Timothy Smith
  • 800
  • 1
  • 5
  • 15
-1
votes
1 answer

Imputation of missing numeric values while preserving the absence of it

Before I dive into the Question itself I'll give a brief explanation of the data set and the problem The Data set I have a data set of roughly 20000 records and I intend to use it to train a classifier which classifies the a given record as…
Ranika Nisal
  • 910
  • 6
  • 20
-1
votes
1 answer

R MICE Imputation

data=data.frame("student"=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5), "time"=c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4), "v1"=c(16,12,14,12,17,16,12,12,13,12,16,16,10,10,14,17,17,12,10,11), "v2"=c(1,1,3,2,2,2,3,1,2,1,2,1,3,1,1,2,3,3,1,2), "v3"=c(4,1,…
bvowe
  • 3,004
  • 3
  • 16
  • 33
-1
votes
1 answer

how to replace the value for one timestamp by the mean value of all the observations for this timestamp

When dealing with time series problems in R, I have multiple observations for one timestamp, how to replace the value for one timestamp by the mean value of all the observations for this timestamp and delete all the overlapped timestamp rows. For…
Cherry
  • 73
  • 6
-1
votes
1 answer

Different test_set and train_set dimensionalities after removing columns with high percentual of missing values

I'm currently having a problem with a fraud detection project. The dataset is already Split in train and test set so initially I had a 0.7 Split with the test set containing 393 columns and the train set containing 394 as expected, but when I…
-1
votes
1 answer

replacing NA values with specific averege

i have a data.frame with columns and rows. how could i replace NA values so that it would be the average of the first value before and after that cell in that column? for example: 1. 1 2 3 2. 4 NA 7 3. 9 NA 8 4. 1 5 6 I need the first NA…
-1
votes
1 answer

Fine and Gray model in R with imputed datasets

I have a long (vertically stacked) dataset containing 10 imputations (variable "imputation" identifies imputation number). The imputation was done in SAS but I would like to calculate some c-statistics using R. I know how to calculate c-stats…
-1
votes
1 answer

Using imputation models created from amelia or mice in R for new data

Suppose I run one of the missing variable imputation R packages, amelia or mice (or similar), on a large data frame -- let's say 100000 rows and 50 columns -- to get imputations for one particular column with some (let's say 200) NAs in it. Is there…
bioniclime
  • 47
  • 5
-1
votes
1 answer

Pandas: Missing value imputation based on date

I have a pandas data-frame which is as follows: df_first = pd.DataFrame({"id": [102, 102, 102, 102, 103, 103], "val1": [np.nan, 4, np.nan, np.nan, 1, np.nan], "val2": [5, np.nan, np.nan, np.nan, np.nan, 5], "rand": [np.nan, 3, 7, 8, np.nan, 4],…
gorjan
  • 5,405
  • 2
  • 20
  • 40
-1
votes
1 answer

R program dealing with missing values (Similar to apply function in Python)

I am new to 'R' program and currently want to deal with the missing values. Basically, I have a dataset with a few columns and there are missing values in the 'Purchase' column. I want to impute the mean of Purchase values based on 'Master_Category'…
Nagesh
  • 3
  • 3
-1
votes
1 answer

Random Forest missing values in cases where the variables do not apply

SOME BACKGROUND I am working on a training Random Forest regressor, for predicting yield in crops. Some of my predictor variables apply only to some cases, e.g. I have a variable denoting the number of rows, which only applies to crops grown in a…