Questions tagged [missing-data]

For questions relating to missing data problems, which can involve special data structures, algorithms, statistical methods, modeling techniques, visualization, among other considerations.

When working with data in regular data structures (e.g. tables, matrices, arrays, tensors), some data may not be observed, may be corrupted, or may not yet be observed. Treatment of such data requires additional annotation as well as methodological considerations when deciding how to impute or use such data in standard contexts. This becomes a problem in data-intensive contexts, such as large statistical analyses of databases.

Missing data occur in many fields, from survey data to industrial data. There are many underlying missing data mechanisms (reasons why the data is missing). In survey data for example, data might be missing due to drop-out. People answering the survey might run out of time.

Rubin classified missing data into three types:

  1. missing completely at random;
  2. missing at random;
  3. missing not at random.

Note that some statistical analysis is only valid under certain class.

2809 questions
8
votes
1 answer

Using imputed datasets from library mice() to fit a multi-level model in R

I'm new to package mice in R. But I'm trying to impute 5 datasets from popmis and then fit an lmer() model with() each and finally pool() across them. I think the pool() function in mice() doesn't work with the lmer() call from lme4 package,…
rnorouzian
  • 7,397
  • 5
  • 27
  • 72
8
votes
4 answers

MCAR Little's test in Python

How can I execute Little's Test, to find MCAR in Python? I have looked at the R package for the same test, but I want to do it in Python. Is there an alternate approach to test MCAR?
8
votes
3 answers

Implementation of sklearn.impute.IterativeImputer

Consider data which contains some nan below: Column-1 Column-2 Column-3 Column-4 Column-5 0 NaN 15.0 63.0 8.0 40.0 1 60.0 51.0 NaN 54.0 31.0 2 15.0 17.0 55.0 80.0 NaN 3 54.0 43.0 70.0 16.0 …
k.ko3n
  • 954
  • 8
  • 26
8
votes
4 answers

Filling nulls with a list in Pandas using fillna

Given a pd.Series, I would like to replace null values with a list. That is, given: import numpy as np import pandas as pd ser = pd.Series([0,1,np.nan]) I want a function that would return 0 0 1 1 2 [nan] But if I try using the…
splinter
  • 3,727
  • 8
  • 37
  • 82
8
votes
5 answers

Woocommerce - Cart page not displaying

After ADD TO CART option i can see the items are getting updated to CART but when moving to cart page it is redirecting back to homepage. The cart page shortcode is also provided. Please help out!I'm new to woocommerce.
Chinou
  • 461
  • 1
  • 5
  • 23
8
votes
6 answers

Pandas-Add missing years in time series data with duplicate years

I have a dataset like this where data for some years are missing . County Year Pop 12 1999 1.1 12 2001 1.2 13 1999 1.0 13 2000 1.1 I want something like County Year Pop 12 1999 1.1 12 2000 NaN 12 2001 1.2 13 1999…
ks2882
  • 191
  • 1
  • 6
8
votes
3 answers

Treat nan as zero in numpy array summation except for nan in all arrays

I have two numpy arrays NS, EW to sum up. Each of them has missing values at different positions, like NS = array([[ 1., 2., nan], [ 4., 5., nan], [ 6., nan, nan]]) EW = array([[ 1., 2., nan], [ 4., nan, nan], …
Superstar
  • 419
  • 1
  • 5
  • 7
8
votes
2 answers

How to convert Date or Datetime field when some parts are blank; na.omit fails

I have a data set that has dates and times for in and out. Each line is an in and out set, but some are blank. I can remove the blanks with na.omit and a nice read in (it was a csv, and na.strings=c("") works on the read.csv). Of course, because…
Rufus Shinra
  • 383
  • 1
  • 9
8
votes
1 answer

imputing data with median by date in R

I need to replace the missing values in the field "steps" by the median of "steps" calculated over that particular day (group by "date") with NA values removed. I have already referred this thread but my NA values aren't replaced. Can somebody help…
Meeshu
  • 95
  • 8
8
votes
2 answers

How to handle missing NaNs for machine learning in python

How to handle missing values in datasets before applying machine learning algorithm??. I noticed that it is not a smart thing to drop missing NAN values. I usually do interpolate (compute mean) using pandas and fill it up the data which is kind of…
pbu
  • 2,982
  • 8
  • 44
  • 68
8
votes
2 answers

In gnuplot, with "set datafile missing", how to ignore both "nan" and "-nan"?

The gnuplot command set datafile missing "nan" tells gnuplot to ignore nan data values in the data file. How to ignore both nan and -nan? I tried the following in gnuplot, but then the effect of the first statement is overwritten by the…
user1069609
  • 863
  • 5
  • 16
  • 30
8
votes
1 answer

R-generate a "missing values variable"

I am using R to generate examples of how to deal with missing data for the statistics class I am teaching. One method requires generating a "missing values binary variable", with 0 for cases containing missing values, and 1 with no missing values.…
jeramy townsley
  • 240
  • 3
  • 18
8
votes
3 answers

Predict.glm not predicting missing values in response

For some reason, when I specify glms (and lm's too, it turns out), R is not predicting missing values of the data. Here is an example: y = round(runif(50)) y = c(y,rep(NA,50)) x = rnorm(100) m = glm(y~x, family=binomial(link="logit")) p =…
generic_user
  • 3,430
  • 3
  • 32
  • 56
8
votes
2 answers

Automatically join missing data gaps in Highcharts JS

I'm currently looking to implement Highcharts JS into my application, using months as the x-axis categories. However, I have gaps in my data, and wish for the chart to automatically connect the gaps. For example, if I don't have any data for March,…
Curtis
  • 101,612
  • 66
  • 270
  • 352
7
votes
1 answer

Discarding a single attribute in R

In R, the na.omit() function can be used to discard entries in a data.frame that contain NA values. As a side effect, if lines are indeed discarded, the function adds an attribute 'omit' to the result that contains a vector of the row.names that…
reddish
  • 1,360
  • 2
  • 12
  • 27