Questions tagged [missing-data]

For questions relating to missing data problems, which can involve special data structures, algorithms, statistical methods, modeling techniques, visualization, among other considerations.

When working with data in regular data structures (e.g. tables, matrices, arrays, tensors), some data may not be observed, may be corrupted, or may not yet be observed. Treatment of such data requires additional annotation as well as methodological considerations when deciding how to impute or use such data in standard contexts. This becomes a problem in data-intensive contexts, such as large statistical analyses of databases.

Missing data occur in many fields, from survey data to industrial data. There are many underlying missing data mechanisms (reasons why the data is missing). In survey data for example, data might be missing due to drop-out. People answering the survey might run out of time.

Rubin classified missing data into three types:

  1. missing completely at random;
  2. missing at random;
  3. missing not at random.

Note that some statistical analysis is only valid under certain class.

2809 questions
11
votes
2 answers

Flexslider 2 direction nav pointers missing from download

Where can I get Flexslider2's image file for the direction nav pointers: bg_direction_nav.png? Couldn't work out why I kept getting strange text like 'Fl' or 'Fi' in place of the arrows on the slider. Checking through everything, I have found that…
lizz
  • 111
  • 1
  • 1
  • 3
10
votes
4 answers

R: replace NA with item from vector

I am trying to replace some missing values in my data with the average values from a similar group. My data looks like this: X Y 1 x y 2 x y 3 NA y 4 x y And I want it to look like this: X Y 1 x y 2 x y 3 y y 4 x …
gregmacfarlane
  • 2,121
  • 3
  • 24
  • 53
10
votes
4 answers

How to create "NA" for missing data in a time series

I have several files of data that look like this: X code year month day pp 1 4515 1953 6 1 0 2 4515 1953 6 2 0 3 4515 1953 6 3 0 4 4515 1953 6 4 0 5 4515 1953 6 5 3.5 Sometimes there is data missing,…
sbg
  • 1,772
  • 8
  • 27
  • 45
10
votes
1 answer

Filling missing dates in BigQuery (SQL) without creating a new calendar

I am trying to create a SQL so I can make a time series chart in Google Data Studio with connection of BigQuery. You can see my SQL below. WITH CTE_1 AS (SELECT ID, Date, Min_Predict, Max_Predict, Interval ,ROW_NUMBER() OVER (PARTITION BY ID ORDER…
beginner
  • 229
  • 1
  • 4
  • 11
10
votes
3 answers

Creating a ts time series with missing values from a data frame

I have a data frame containing a time series of monthly data, with some missing values. dates <- seq( as.Date("2010-01-01"), as.Date("2017-12-01"), "1 month" ) n_dates <- length(dates) dates <- dates[runif(n_dates) < 0.5] time_data <- data.frame( …
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
10
votes
4 answers

Replace empty values of a dictionary with NaN

I have a dictionary with missing values (the key is there, but the associated value is empty). For example I want the dictionary below: dct = {'ID':'', 'gender':'male', 'age':'20', 'weight':'', 'height':'5.7'} to be changed to this form: dct =…
user9439906
  • 433
  • 2
  • 7
  • 17
10
votes
7 answers

generate random sequences of NA of random lengths in a vector

I want to generate missing values in a vector so that the missing value are grouped in sequences, to simulate periods of missing data of different length. Let's say I have a vector of 10 000 values and I want to generate 12 sequences of NA at random…
agenis
  • 8,069
  • 5
  • 53
  • 102
10
votes
1 answer

Multiple Imputation of missing and censored data in R

I have a dataset with both missing-at-random (MAR) and censored data. The variables are correlated and I am trying to impute the missing data conditionally so that I can estimate the distribution parameters for a correlated multivariate normal…
chelsea
  • 117
  • 4
10
votes
1 answer

Insert missing time rows into a dataframe

Let's say I have a dataframe: df <- data.frame(group = c('A','A','A','B','B','B'), time = c(1,2,4,1,2,3), data = c(5,6,7,8,9,10)) What I want to do is insert data into the data frame where it was missing in the…
10
votes
1 answer

R factor NA vs

I have the following data frame: df1 <- data.frame(id = 1:20, fact1 = factor(rep(c('abc','def','NA',''),5))) df1 id fact1 1 1 abc 2 2 def 3 3 NA 4 4 5 5 abc 6 6 def 7 7 NA 8 8 9 9 abc 10 10 def 11…
screechOwl
  • 27,310
  • 61
  • 158
  • 267
9
votes
2 answers

Subset a factor by NA levels

I have a factor in R, with an NA level. set.seed(1) x <- sample(c(1, 2, NA), 25, replace=TRUE) x <- factor(x, exclude = NULL) > x [1] 1 2 2 1 2 2 1 1 [12] 1 2 2 2 1 …
Zach
  • 29,791
  • 35
  • 142
  • 201
9
votes
2 answers

Multidimensional scaling with missing values in dissimilarity matrix

I have a dissimilarity matrix on which I would like to perform multidimensional scaling (MDS) using the sklearn.manifold.MDS function. The dissimilarity between some elements in this matrix is not meaningful and I am thus wondering if there is a way…
9
votes
3 answers

Clustering algorithm in R for missing categorical and numerical values

I want to perform marketing segmentation clustering on a dataset with missing categorical and numerical values in R. I cannot perform k-means clustering because of the missing values. R version 3.1.0 (2014-04-10) Platform: x86_64-apple-darwin13.1.0…
Scott Davis
  • 983
  • 6
  • 22
  • 43
9
votes
3 answers

Switched Branch After .gitignore and lost .gitinored files

I am new to git, so sorry if this question has already been answered. I'm having trouble finding the answer to this. I wanted to ignore a set of files that had never been committed before for a commit and used the github app to select them and…
Betsy Dupuis
  • 513
  • 1
  • 6
  • 21
9
votes
5 answers

R gbm handling of missing values

Does anyone know how gbm in R handles missing values? I can't seem to find any explanation using google.
screechOwl
  • 27,310
  • 61
  • 158
  • 267