Questions tagged [missing-data]

For questions relating to missing data problems, which can involve special data structures, algorithms, statistical methods, modeling techniques, visualization, among other considerations.

When working with data in regular data structures (e.g. tables, matrices, arrays, tensors), some data may not be observed, may be corrupted, or may not yet be observed. Treatment of such data requires additional annotation as well as methodological considerations when deciding how to impute or use such data in standard contexts. This becomes a problem in data-intensive contexts, such as large statistical analyses of databases.

Missing data occur in many fields, from survey data to industrial data. There are many underlying missing data mechanisms (reasons why the data is missing). In survey data for example, data might be missing due to drop-out. People answering the survey might run out of time.

Rubin classified missing data into three types:

  1. missing completely at random;
  2. missing at random;
  3. missing not at random.

Note that some statistical analysis is only valid under certain class.

2809 questions
1
vote
1 answer

PHPExcel - All values are not shown in chart

The library in question is PHPExcel 1.7.7 I've used sample code found in a thread at codeplex to create charts with PHPExcel. However, the sample code presented in the forums only deal with two columns of data, and I'm looking to expand the number…
Duniyadnd
  • 4,013
  • 1
  • 22
  • 29
1
vote
1 answer

Omitting missing data in a dataframe

I have the following dataframe: DF <- data.frame(x = c(1, 2, 3,NA), y = c(1,0, 10, NA), z=c(43,NA, 33, NA)) If I want to omit only x = NA and z = NA. complete.cases deletes all the row contains NA for desired column. Therefore, I am not sure how to…
user1489975
  • 1,841
  • 2
  • 14
  • 8
1
vote
1 answer

Where is my AWS EMR reducer output for my completed job (should be on S3, but nothing there)?

I'm having an issue where my Hadoop job on AWS's EMR is not being saved to S3. When I run the job on a smaller sample, the job stores the output just fine. When I run the same command but on my full dataset, the job completes again, but there is…
1
vote
1 answer

Removing dates with less than Full observations

I have an xts object that covers 169 days of high frequency 5 minute regular observations, but on some of the days there are missing observations, i.e less than 288 data points. How do I remove these so to have only days with full data points? find…
number8
  • 161
  • 8
1
vote
1 answer

Numeric Filter and missing values (Weka)

I'm using SMOTE to oversample my dataset (affected by class imbalance). Some of my attributes have integer values, others have only two decimals but SMOTE creates new instances with many decimals. So to solve this problems I thought to use…
Titus Pullo
  • 3,751
  • 15
  • 45
  • 65
0
votes
2 answers

R: remove multiple rows based on missing values in fewer rows

I have an R data frame with data from multiple subjects, each tested several times. To perform statistics on the set, there is a factor for subject ("id") and a row for each observation (given by factor "session"). I.e. print(allData) id session…
Jonas Lindeløv
  • 5,442
  • 6
  • 31
  • 54
0
votes
2 answers

Why could database changes disappear?

I have a MongoDB server running on an 64-bit Amazon EC2 instance (journaling enabled). Yesterday I updated some documents and refreshed the webpage to make sure it reflects the changes. It did. But today I see that not only yesterday's changes are…
Vitaly
  • 4,358
  • 7
  • 47
  • 59
0
votes
2 answers

MySQL >, <, and missing by group

I have two tables in MySQL that I'm comparing with the following attributes: tbl_fac : facility_id, chemical_id, criteria 10 , 25 , 50 10 , 26 , 60 10 , 27 , 60 …
Josh
  • 177
  • 1
  • 11
0
votes
6 answers

Substituting missing values in Python

I want to substitute missing values (None) with the last previous known value. This is my code. But it doesn't work. Any suggestions for a better algorithm? t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]] def…
Randomtheories
  • 1,220
  • 18
  • 22
0
votes
0 answers

How to fill nan values by comparing with other columns?

import pandas as pd import numpy as np df = pd.DataFrame({'SKU': [700, 701, 702, 702, 703, 704, 705, 705], 'CATEGORY': ['T', 'F', 'F', nan, 'W', 'W', 'T', nan]}) print (df) I sorted the original data according to 'SKU', and tried using .ffill to…
0
votes
0 answers

Missing element to complete a palindrome by python

I don't know how to do it USING PYTHON. But the program must print the missing letter in the given palindrome. INPUT should be: input_string="abcdeedba" OUTPUT should be: c I guess we can solve this using set difference or split it into two halves…
0
votes
0 answers

Missing data from collection when looping

I am facing an issue of missing data when looping the data in collection. Let me explain in detail, Process : 1.Firstly a data report from sap has to be downloaded . 2.using get collection action taking all the data into collection(named as : data…
0
votes
0 answers

is it possible to account for repeated measurements in zero inflated or negative binomial hurdle model

My study comprises of repeatedly measured maternal smoking (secondhand smoking) variable (binary 0,1) at different time points of 2months, 6 months, 5years, 9years and 13years and outcomes as childhood dental caries (dmft index, count data) measured…
0
votes
1 answer

Filling DF's NaN/Missing data from another DF

I have two data frames: df1 = pd.DataFrame({'Group': ['xx', 'yy', 'zz', 'x', 'x', 'x','z','y','y','y','y'], 'Name': ['A', 'B', 'C', None, None, None, None, None, None, None, None], 'Value': [5, 3, 4, 7, 1, 3,…
0
votes
1 answer

Fill in the missing values if rows in other columns are the same

I have a table which looks the following way: Name Region Id Name1 US 123 Name1 US Name2 US 122 Name3 US 124 Name1 UK Name1 UK 135 Name2 UK 140 Name3 US As you can see there are empty values in the ID column which I want…
1 2 3
99
100