Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values).

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

931 questions
-1
votes
1 answer

Impute missing values using apply and lambda functions

I am trying to impute the missing values in "Item_Weight" variable by taking the average of the variable according to different "Item_Types" as per the code below. But when I run it, I am getting Key error as added below. Is it the pandas version…
-1
votes
1 answer

Missing value imputation using the fancyimpute KNN package in python

I am trying to use the KNN package for imputing the missing values I have in my dataframe. My dataframe columns have different ranges i.e some of them are much greater in value than others. My understanding is that the KNN algorithm uses the…
-1
votes
2 answers

How should I impute NaN values in a categorical column?

Should I encode a categorical column and use label encoding, then impute NaN values with most frequent value, or are there other ways? As encoding requires converting dataframe to array, then imputing would require again array to dataframe…
Ekam Dz Singh
  • 27
  • 1
  • 7
-1
votes
1 answer

Convert .gprobs files from Impute2 to PLINK format

I have some imputed .gprobs files (one per chromosome), imputed by Impute2 downloaded from dbGaP, and I need to convert this file into .bed format of PLINK in order to do some analysis. My .gprobs files look like: --- rs371609562:61395:CTT:C 61395…
-1
votes
3 answers

r replace each missing value with a mean of two previous values

I have a data frame with some NAs in column 'myvalues': x <- data.frame(mydates = as.Date(c("2018/04/01","2018/04/02","2018/04/03","2018/04/04", …
user3245256
  • 1,842
  • 4
  • 24
  • 51
-1
votes
1 answer

Forward-fill missing data in PySpark not working

I have a simple dataset as shown under. | id| name| country| languages| |1 | Bob| USA| Spanish| |2 | Angelina| France| …
user8907896
  • 71
  • 1
  • 2
  • 9
-1
votes
1 answer

Python: automatic data imputing based on machine learning

Besides filling the missing data with mean, one could actually use machine learning (even regression) to fill in the missing value. As there are more and more automatic machine learning code available. I wonder if there is any python code that…
user40780
  • 1,828
  • 7
  • 29
  • 50
-1
votes
1 answer

How to impute the missing data using EM Bootstrap method in Amelia in R package

I'm going to compare my model with the EMB method in Amelia package. I read the article, but it does not mentioned specifically to call the EMB method from Amelia. I got two questions: how to call EMB from Amelia? Does it correct if i want to…
amjay
  • 11
  • 6
-1
votes
1 answer

Training data has columns with all missing values but same columns in the Test data has some values, how to handle such situation?

I have been given a training and test datasets separately. Both data sets have exactly same structure (same columns/features). There are some columns in the training data set that have missing values in all the rows. If I wanted to build a…
-1
votes
1 answer

Impute values of a vector using Cosine similarity in Python

The Scenario I have a Dataset whose last column has NaN values in it, which need to be imputed using only Vector Cosine & Pearson Correlation; after which the data will be further taken for Clustering. The Problem It is mandatory for my case to use…
T3J45
  • 717
  • 3
  • 12
  • 32
-1
votes
1 answer

I have a classification project where some of the columns/features have more than 90% null values. How do I handle them?

In my classification problem, some of the features(~5) among 85 features have mostly null values (>90%). How do I handle these values? Do I, 1) Ignore these columns/features altogether 2) Try and impute these values, if so how? 3) Any other…
-1
votes
1 answer

How can I use rowSums() after multiple imputation with MICE package in R

I have a short question: I imputed item data using multiple imputation with the MICE package. After imputation, I would like to sum items to a total score. However, my data is now in a mids object, and I can't figure out how to do this simple…
L. Bakker
  • 147
  • 1
  • 13
-1
votes
3 answers

NA replacement using mean or median value? which will be better for my data?

I have the following dataset: 5 3 3 5 10 10 3 8 2 12 8 6 2 5 6 5 10 4 3 5 4 3 3 5 8 3 5 6 6 1 10 3 6 6 5 8 3 4 3 4 4 3 2.5 1 4 2 2 3 5 10 4 4 6 3 2 …
Madhu Sareen
  • 549
  • 1
  • 8
  • 20
-1
votes
1 answer

Approximate missing value for input given a data set

I have a data set with x attributes and y records. Given an input record which has up to x-1 missing values, how would I reasonably approximate one of the remaining missing values? So in the example below, the input record has two values (for…
Z-Mehn
  • 268
  • 1
  • 2
  • 12
-1
votes
1 answer

Data Cleaning for Survival Analysis Using a Participant's Own Data to Impute Values

I’m in the process of cleaning some data for a survival analysis and I am trying to make it so that missing data gets imputed based on the surrounding values within a given subject. I'd like to use the mean of the closest previous and closest…
Jonah M.
  • 5
  • 1