Questions tagged [missing-data]

For questions relating to missing data problems, which can involve special data structures, algorithms, statistical methods, modeling techniques, visualization, among other considerations.

When working with data in regular data structures (e.g. tables, matrices, arrays, tensors), some data may not be observed, may be corrupted, or may not yet be observed. Treatment of such data requires additional annotation as well as methodological considerations when deciding how to impute or use such data in standard contexts. This becomes a problem in data-intensive contexts, such as large statistical analyses of databases.

Missing data occur in many fields, from survey data to industrial data. There are many underlying missing data mechanisms (reasons why the data is missing). In survey data for example, data might be missing due to drop-out. People answering the survey might run out of time.

Rubin classified missing data into three types:

  1. missing completely at random;
  2. missing at random;
  3. missing not at random.

Note that some statistical analysis is only valid under certain class.

2809 questions
1
vote
0 answers

PHP accessing MySQL database correctly for some users, others missing some columns

I am at a loss for what is happening here -- because depending on which user I check, there isn't necessarily something wrong at all. Here's the database in question's structure: userID password salt VARCHAR(7) VARCHAR(64) …
Julian
  • 71
  • 6
1
vote
4 answers

How to view duplicated records with one or more NA's?

my dataset looks like following: ID Score A1 60 A1 50 A1 NA B1 30 B1 33 C1 48 C1 39 D1 21 D1 38 D1 NA I would like to see duplicated records which has NA's. Such as: A1 60 A1 50 A1 NA D1 21 D1 38 D1 NA Thanks for your time and kind…
M.Qasim
  • 1,827
  • 4
  • 33
  • 58
1
vote
1 answer

Plotting two columns against each other omitting missing values

This is my dataframe using dput(): structure(list(Year = 1900:1903, Top.10..income.share = structure(c(82L, 81L, 76L, 75L), .Label = c("", "30,3", "30,65", "30,8", "31,3", "31,37", "31,38", "31,4", "31,5", "31,51", "31,52", "31,55", "31,62",…
LukasKawerau
  • 1,071
  • 2
  • 23
  • 42
1
vote
3 answers

Access 2007 - Left Join returns correct results, Inner Join returns nothing

I have a query that the only way I could get it to work was to left join, on three fields. If I did an ordinary inner join on these three fields the query returned nothing. But if I try each individual join separately, they all join as I would…
Wilskt
  • 337
  • 2
  • 9
  • 24
1
vote
3 answers

how to find a missing case using proc sql in sas?

I would like to use proc sql in sas to identify if a case or record is missing some information. I have two datasets. One is a record of an entire data collection, that shows what forms have been collected during a visit. The second is a…
blue and grey
  • 393
  • 7
  • 21
1
vote
1 answer

Notepad++ - How to search files for a missing string

I have cluster of 10 folders, each with 1000 program files. I need to search these text files for a string. All files must start with $O123456.MIN% (123456 bearing random file names). I know how to find if the string exists, but how do I identify…
user1987121
  • 11
  • 1
  • 3
1
vote
0 answers

Insights API returns only 7 fields for some posts, 31 for others?

I have two otherwise identical posts on a Facebook page that I administer. One post we'll call "full" returns the full range of insight values (31) I'd expect even when the values are zero, while the other which we'll call "subset" returns only a…
Patrick
  • 1,484
  • 2
  • 12
  • 18
1
vote
3 answers

t-sql query that returns missing records

I have a query (ContactFormTypesRequired) that returns ContactID and FormTypeID utilizing related tables that are not shown below. This is a list of FormTypes that each Contact should have related to it as a Form. I need a query that returns…
cResults
  • 733
  • 1
  • 5
  • 17
1
vote
2 answers

ASP.NET Web Reference Missing from DLL

I am working with a .NET 3.5 class library that was created in Visual Studio 2008, and later updated and recompiled in Visual Studio 2010. The strangest thing is happening: One of the Web References that is listed in the Solution Explorer does not…
Jesse
  • 608
  • 8
  • 19
1
vote
2 answers

Make use of available data and neglect missing data for building classifier

I am using randomForest package in R platform to build a binary classifier. There are about 30,000 rows with 14,000 being in positive class and 16,000 in negative class. I have 15 variables that have been known to be important for classification. I…
Abhishek
  • 279
  • 2
  • 5
  • 18
1
vote
6 answers

Cant assign missing values to string variable in SPSS using the GUI

I am strugling recoding missing values in SPSS using the graphical user interface. I can easily recode numeric variables using the GUI and the dialogue box shown below: But when i enter a string variable into the same dialogue box the option to…
Rene Bern
  • 545
  • 3
  • 10
  • 18
1
vote
1 answer

Post-imputation transformations: how to create variable depending on value in other variable(s)?

I am using Amelia to generate five new hypothetical data sets for each variable with missing data. Now, I want to create new variables after imputation. The reproducible dataset is in the Zelig…
TiF
  • 615
  • 2
  • 12
  • 24
1
vote
1 answer

Highcharts: Displaying Linechart with missing datapoints

I am calculating the average-value of properties for each week of the year. And I want to display these information in a line chart (x-Axis is the week of year, y-Axis the average value and the different lines represent different properties). But…
Pascal Klein
  • 23,665
  • 24
  • 82
  • 119
1
vote
1 answer

introducing a gap in continuous x axis using ggplot

This is kinda a build-on on my previous post creating an stacked area/bar plot with missing values (all the script I run can be found there). In this post, however, Im asking if its possible to leave a gap in an continuous x axis? I have a…
jO.
  • 3,384
  • 7
  • 28
  • 38
1
vote
1 answer

Correct for missing values in a Stacked area plot using ggplot2

I've been trying to recreate this post on a combination of stacked bar/area plot. I have some problems with missing values though. Here's my data: https://www.dropbox.com/sh/pnkspwnn1qslm6u/JapTKCwqMS What I run is; …
jO.
  • 3,384
  • 7
  • 28
  • 38
1 2 3
99
100