Questions tagged [imputation]

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values).

Missing data imputation is the process of replacing missing data with substituted, 'best guess', values. Because missing data can create problems for analyzing data and can lead to missing-data bias, imputation is seen as a way to avoid the problems associated with listwise deletion (ignoring all observations with any missing values). Multiple methods for imputation exist, including: imputing missing values with a single value, such as the mean or median or some specific value based on domain-expertise; distance based heuristics such as kNN; stochastic averaging via multiple imputation; and model-based methods including Expectation Maximization (EM).

Suggested tag synonym: "missing-data"

931 questions
-1
votes
1 answer

'replacement has length zero' error in R

I'm trying to impute NA's of temperature data in R. It is spatiotemporal data which has 487 observatories and 60 time units (60 months). What I want to do here is replace NA with the value of which has the smallest distance (not zero) from the NA's…
Donna
  • 11
  • 1
  • 5
-1
votes
2 answers

Input missing days in a day sequence in R

I have a problem with regard to inputing missing observations in a data frame with R, below is an snapshot of the data frame: Sample of the data frame I actually have 66 different districts, 21 days and each day and each district should have 144…
Felix Zhao
  • 459
  • 5
  • 9
-1
votes
1 answer

Select important features then impute or first impute then select important features?

I have a dataset with lots of features (mostly categorical features(Yes/No)) and lots of missing values. One of the techniques for dimensionality reduction is to generate a large and carefully constructed set of trees against a target attribute and…
Karup
  • 2,024
  • 3
  • 22
  • 48
-2
votes
0 answers

Why is the imputation not working even though the array is larger then the number of nulls?

#Num has too much unique values and we cant do simple imputation because each unique value has such a low count that setting all 72 nulls to that value will skew the results so we just randomly impute the nulls using the existing values num_values =…
-2
votes
1 answer

Imputing NA's with millions of rows of data in R

I have an orders dataset that contains sales order and sales order line information. Below is a screen shot of the first few columns of data: Where sales order is the unique but can have multiple sales order line's per sales order. 20% of the data…
Josh Ortega
  • 179
  • 8
-2
votes
1 answer

how and using which library can i use knn imputation for missing value analysis

I am trying to to impute the missing values using knn but i couldnt able to use the code: from fancyimpute import KNN is there any other library for knn imputation ?
-2
votes
1 answer

ValueError while fitting a model even after imputation

I am using the Melbourne Housing Dataset from Kaggle to fit a regression model on it, with Price being the target value. You can find the dataset here import numpy as np import pandas as pd from sklearn.ensemble import GradientBoostingRegressor from…
Aditya Mishra
  • 1,687
  • 2
  • 15
  • 24
-2
votes
1 answer

Missing value imputaion in python

By doing df.groupby('acc_count', as_index=False)['avg_spd'].median() I got acc_count avg_spd 0 20.94 1 24.42 2 26.035 3 33.27 4 33.46 5 …
-2
votes
1 answer

Error in data frame undefined columns after imputing in R

I'm working with imputation with some data in R. I found a code online to perform imputation and then modeling the imputed data and the original data. The code is this: # Using airquality dataset data <- airquality data[4:10,3] <-…
Colonel G
  • 29
  • 6
-2
votes
1 answer

How to imput inhomogeneously missing data

I have a dataframe of shape 2701x128 It has a lot of missing values. The thing is that some rows can have 95% of filled data and some - only 5%. Let me try to visualize it: X-axis is number of row(after sort), y-axis is number of non-zero values…
Ladenkov Vladislav
  • 1,247
  • 2
  • 21
  • 45
-2
votes
1 answer

How to find missing values?

What are the techniques (such as KNN, Max likelihood) that I can use to find the missing values? I want to use R and trying to find a suitable technique to impute the missing values. The sample data is shown below: F1 F2 F3 F4 F5 Class Good …
user7812478
  • 1
  • 1
  • 2
-2
votes
1 answer

How do I identify the type of variable in a dataframe in R?

I am trying to create a comprehensive automated code for my team for missing value imputation using several different methods. I know the logic but I am having trouble in the data class identification which is important in deciding which method to…
Ranjan Pandey
  • 85
  • 2
  • 11
-3
votes
1 answer

Replacing missing data with interpolation values

I was searching information about filling missing values with interpolation and I found three most important ones. So there are : (1) Linear Interpolation (2) Spline Interpolation (3) Stineman Interpolation Can you please share with me algorithms…
John
  • 1,849
  • 2
  • 13
  • 23
-3
votes
1 answer

Scikit learn how to change a categorical value with missing data to a numerical one

I am using sklearn for a machine learning project, and one of the columns is in categorical form. I would like to convert it into numerical form with an ordinal encoder, and then impute the missing data. Sklearn's OrdinalEncoder throws an…
plotka
  • 63
  • 1
  • 8
-4
votes
1 answer

Replacement of NAs for in the variable?

Good evening, I have dataset where there is one variable which is Gender with missing data. Could anyone please help me how could i replace these NAs using R Packages. I have tried the "Mice" package however it does not replace the NAs and its…
P Kumar
  • 1
  • 2
1 2 3
62
63