Questions tagged [missing-data]

For questions relating to missing data problems, which can involve special data structures, algorithms, statistical methods, modeling techniques, visualization, among other considerations.

When working with data in regular data structures (e.g. tables, matrices, arrays, tensors), some data may not be observed, may be corrupted, or may not yet be observed. Treatment of such data requires additional annotation as well as methodological considerations when deciding how to impute or use such data in standard contexts. This becomes a problem in data-intensive contexts, such as large statistical analyses of databases.

Missing data occur in many fields, from survey data to industrial data. There are many underlying missing data mechanisms (reasons why the data is missing). In survey data for example, data might be missing due to drop-out. People answering the survey might run out of time.

Rubin classified missing data into three types:

  1. missing completely at random;
  2. missing at random;
  3. missing not at random.

Note that some statistical analysis is only valid under certain class.

2809 questions
1
vote
8 answers

Reading multiple files and calculating mean based on user input

I am trying to write a function in R which takes 3 inputs: Directory pollutant id I have a directory on my computer full of CSV's files i.e. over 300. What this function would do is shown in the below prototype: pollutantmean <-…
Shery
  • 1,808
  • 5
  • 27
  • 51
1
vote
1 answer

T-SQL query to find missing IDs for a date range

Given these tables table Channel -------------- ChannelID int IDENTITY table Program -------------- ProgramID int IDENTITY ChannelID int AiringDate datetime and this query SELECT …
J F
  • 631
  • 7
  • 15
1
vote
2 answers

R : add a column with missing values to a dataframe

I am using financial data and the row names of my main dataframe are dates. > assets[1:3,1:5] ALD SFN TCO KIM CTX 2003-01-03 48.1 23.98 23.5 23 22.34 2003-01-06 48.1 23.98 23.5 23 22.34 2003-01-07 48.1 23.98 23.5 23 22.34 I…
Lemko
  • 344
  • 3
  • 10
1
vote
0 answers

Merged a branch with gitignored files, can't find those files anymore

I had large log files in a project, so I added the log directory to .gitignore after I had already made a few logs (so there's a log file directory with only a few old logs in my git, all new ones are ignored). I had more logs in a branch and merged…
1
vote
1 answer

Dealing with Zero Values in Principal Component Analysis

I've really been struggling to get my PCA working and I think it is because there are zero values in my data set. But I don't know how to resolve the issue. The first problem is, the zero values are not missing values (they are areas with no…
Thirst for Knowledge
  • 1,606
  • 2
  • 26
  • 43
1
vote
1 answer

Missing records from one table in SQL Server 2008R2

Table 1: Date PlacementID CampaignID Impressions 04/01/2014 100 10 1000 04/01/2014 101 10 1500 04/01/2014 100 11 500 Table 2: Date …
ABD
  • 55
  • 9
1
vote
3 answers

R Missing Value Replacement Function

I have a table with missing values and I'm trying to write a function that will replace the missing values with a calculation based on the nearest two non-zero values. Example: X Tom 1 4.3 2 5.1 3 NA 4 NA 5 7.4 For X =…
user3476463
  • 3,967
  • 22
  • 57
  • 117
1
vote
1 answer

Constraining data imputation in R

I have a data frame (df) with missing values and want to impute interpolated values with restriction. My data frame…
Filly
  • 713
  • 12
  • 23
1
vote
1 answer

R - sort() output missing a row

I have A and B as follows: //edit// I was sleepy and confused. These are NOT data frames. > length(A) [1] 490 > length(B) [1] 17730 > str(A) num [1:490] 0.0113 -0.0106 0.2308 0.0435 0.2814 ... > str(B) num [1:17730] 0.0118 0.0196 0.0344 0.0207…
biohazard
  • 2,017
  • 10
  • 28
  • 41
1
vote
2 answers

Read file with missing data with loadtxt (numpy)

When I tried to read the data below with: loadtxt('RSTN') I got an error. Then I tried to complete this missing data using: genfromtxt('RSTN',delimiter=' ') But I got this error: Line #31112 (got 7 columns instead of 8) I'd like to fill the…
nandhos
  • 681
  • 2
  • 16
  • 31
1
vote
1 answer

finding no of rows with missing data in R

I have a data frame Id Name Affiliation 9 Ernest Jordan 14 K. MORIBE 15 D. Jakominich 25 William H. Nailon 37 P. B. Littlewood Cavendish Laboratory|Cambridge University 44 A. Kuroiwa …
user3171906
  • 543
  • 2
  • 9
  • 17
1
vote
3 answers

Average data in multiple excel file using MATLAB

I have multiple excel files like the one shown below (hourly data). I want to obtain the daily average (e.g. from 17:00 to 16:00 of next day).I only know a little Matlab. Currently my solution is below but it got some problem. Read each excel file…
user2230101
  • 455
  • 3
  • 6
  • 15
1
vote
1 answer

Recoding race variable with 9 categories to dummy

Allow me to preface this by saying that I am new to R. I cleaned some income and rent variables and now I am trying to recode my race variable from 9 categories to 2. The original variable is coded as follows: 1=White 2=Black 3=Native 4=Asian 5=A…
monarque13
  • 568
  • 3
  • 6
  • 27
1
vote
1 answer

How to generate continuous record from incomplete data in Pandas Dataframe

Ok I have a dataset regarding game outcomes that is incomplete and I want to generate a plot with either the data present or zero values for the players that have no data in that game. Furthermore I want to add the data present via a list: some…
2705114-john
  • 762
  • 1
  • 6
  • 10
1
vote
1 answer

'ignoring' missing data in condition for index

I am trying to create an index which increases by 1 if the condition is fulfilled. The code seems to work if there are no missing data. However, if there are missing data, the Index becomes also "NA". How can I avoid this (basically ignoring the…
zoowalk
  • 2,018
  • 20
  • 33