Questions tagged [missing-data]

For questions relating to missing data problems, which can involve special data structures, algorithms, statistical methods, modeling techniques, visualization, among other considerations.

When working with data in regular data structures (e.g. tables, matrices, arrays, tensors), some data may not be observed, may be corrupted, or may not yet be observed. Treatment of such data requires additional annotation as well as methodological considerations when deciding how to impute or use such data in standard contexts. This becomes a problem in data-intensive contexts, such as large statistical analyses of databases.

Missing data occur in many fields, from survey data to industrial data. There are many underlying missing data mechanisms (reasons why the data is missing). In survey data for example, data might be missing due to drop-out. People answering the survey might run out of time.

Rubin classified missing data into three types:

  1. missing completely at random;
  2. missing at random;
  3. missing not at random.

Note that some statistical analysis is only valid under certain class.

2809 questions
26
votes
5 answers

check for any missing dates in the index

Is there any way to check for missing dates in a dataframe directly. I want to check if there are a missing dates between 2013-01-19 to 2018-01-29 GWA_BTC GWA_ETH GWA_LTC GWA_XLM GWA_XRP Date 2013-01-19 …
Jeeth
  • 2,226
  • 5
  • 24
  • 60
24
votes
3 answers

How to subset a data frame by taking only the Non NA values of 2 columns in this data frame

I am trying to subset a data frame by taking the integer values of 2 columns om my data frame Subs1<-subset(DATA,DATA[,2][!is.na(DATA[,2])] & DATA[,3][!is.na(DATA[,3])]) but it gives me an error : longer object length is not a multiple of shorter…
EnginO
  • 321
  • 3
  • 4
  • 8
24
votes
4 answers

pandas - merging with missing values

There appears to be a quirk with the pandas merge function. It considers NaN values to be equal, and will merge NaNs with other NaNs: >>> foo = DataFrame([ ['a',1,2], ['b',4,5], ['c',7,8], [np.NaN,10,11] ],…
aensm
  • 3,325
  • 9
  • 34
  • 44
24
votes
2 answers

Select NA in a data.table in R

How do I select all the rows that have a missing value in the primary key in a data table. DT = data.table(x=rep(c("a","b",NA),each=3), y=c(1,3,6), v=1:9) setkey(DT,x) Selecting for a particular value is easy DT["a",] Selecting for the…
Farrel
  • 10,244
  • 19
  • 61
  • 99
23
votes
6 answers

Replace all NA with FALSE in selected columns in R

I have a question similar to this one, but my dataset is a bit bigger: 50 columns with 1 column as UID and other columns carrying either TRUE or NA, I want to change all the NA to FALSE, but I don't want to use explicit loop. Can plyr do the trick?…
lokheart
  • 23,743
  • 39
  • 98
  • 169
21
votes
8 answers

How to fill NAs with LOCF by factors in data frame, split by country

I have the following data frame (simplified) with the country variable as a factor and the value variable has missing values: country value AUT NA AUT 5 AUT NA AUT NA GER NA GER NA GER 7 GER NA GER NA The…
rp1
  • 371
  • 1
  • 2
  • 9
19
votes
6 answers

How do I handle multiple kinds of missingness in R?

Many surveys have codes for different kinds of missingness. For instance, a codebook might indicate: 0-99 Data -1 Question not asked -5 Do not know -7 Refused to respond -9 Module not asked Stata has a beautiful facility for handling these…
Ari B. Friedman
  • 71,271
  • 35
  • 175
  • 235
19
votes
4 answers

Can't drop NAN with dropna in pandas

I import pandas as pd and run the code below and get the following result Code: traindataset = pd.read_csv('/Users/train.csv') print traindataset.dtypes print traindataset.shape print traindataset.iloc[25,3] traindataset.dropna(how='any') print…
fangh
  • 331
  • 1
  • 2
  • 3
18
votes
3 answers

Fill missing dates by group

In my data, there exist observations for some IDs in some months and not for others, e.g. dat <- data.frame(c(1, 1, 1, 2, 3, 3, 3, 4, 4, 4), c(rep(30, 2), rep(25, 5), rep(20, 3)), c('2017-01-01', '2017-02-01', '2017-04-01', '2017-02-01',…
kathystehl
  • 831
  • 1
  • 9
  • 26
18
votes
2 answers

Filling in missing (blanks) in a data table, per category - backwards and forwards

I am working with a large data set of billing records for my clinical practice over 11 years. Quite a few of the rows are missing the referring physician. However, using some rules I can quite easily fill them in but do not know how to implement it…
Farrel
  • 10,244
  • 19
  • 61
  • 99
18
votes
3 answers

How to replace NA (missing values) in a data frame with neighbouring values

862 2006-05-19 6.241603 5.774208 863 2006-05-20 NA NA 864 2006-05-21 NA NA 865 2006-05-22 6.383929 5.906426 866 2006-05-23 6.782068 6.268758 867 2006-05-24 6.534616 6.013767 868 2006-05-25 6.370312…
Arun
  • 447
  • 1
  • 5
  • 12
17
votes
5 answers

Why does max() sometimes return nan and sometimes ignores it?

This question is motivated by an answer I gave a while ago. Let's say I have a dataframe like this import numpy as np import pandas as pd df = pd.DataFrame({'a': [1, 2, np.nan], 'b': [3, np.nan, 10], 'c':[np.nan, 5, 34]}) a b c 0 1.0…
Cleb
  • 25,102
  • 20
  • 116
  • 151
17
votes
2 answers

Replace NaN or missing values with rolling mean or other interpolation

I have a pandas dataframe with monthly data that I want to compute a 12 months moving average for. Data for for every month of January is missing, however (NaN), so I am using pd.rolling_mean(data["variable"]), 12, center=True) but it just gives me…
Alexis Eggermont
  • 7,665
  • 24
  • 60
  • 93
17
votes
5 answers

java.lang.NoClassDefFoundError: android.support.v7.appcompat.R$styleable

i am using terminal [not eclipse]. i got following exception error, while i use emulator.debug successfully and installd successfully. But emulator show Unfortunatly app has stop. Then i run $ adb logcat it will display following.…
Balakrishnan
  • 266
  • 2
  • 5
  • 16
17
votes
2 answers

missing value in highcharts line graph results in no line, just points

please take a look at this: http://jsfiddle.net/2rNzr/ var chart = new Highcharts.Chart({ chart: { renderTo: 'container' }, xAxis: { categories: ['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct',…
Tony
  • 435
  • 1
  • 6
  • 11