0

I have a dataset of 5,000 records and each of those records consists of a series of continuous measurements collected over a decade at various times. Each of the measurements was originally entered by manually and, as might be expected, there are a number of errors that need to be corrected.

Typically the incorrect data change by >50% from point to point, while data that is correct changes at most by 10% at any one time. If I visualize the data individually, these are very obvious in an X/Y plot with time on the X-axis.

It's not feasible to graph each of these individually, and I'm trying to figure out if there's a faster way to automate and flag the data that are obviously in error and need to be corrected/removed.

Does anyone have any experience with a problem like this?

Vance L Albaugh
  • 115
  • 1
  • 6
  • If you need recommendations for statistical methods to identify outliers, you should ask over at [stats.se]. This isn't a very specific programming question that's appropriate for Stack Overflow. – MrFlick Jun 21 '17 at 00:58
  • I should clarify that the "outliers" are not real, they are incorrectly entered data... I need a way to quickly visualize each of the records or automate determining which records have incorrect data... I agree the question is not very specific... I will try to revise and make more specific - thanks for your comment – Vance L Albaugh Jun 21 '17 at 01:01
  • You could try some type of dummy variable with `dplyr::mutate()` and a logical condition, using `case_when()` or `if_else()`. So, if the value is above a certain threshold, this variable will be 1, let's say, otherwise 0. Then remove the 1s with `filter()`, assuming you want to take them out. – RobertMyles Jun 21 '17 at 01:02
  • 2
    Generating a small reproducible example data set will help. – SymbolixAU Jun 21 '17 at 01:02
  • No function, but this answer might help you, https://stackoverflow.com/questions/21947091/how-to-winsorize-or-remove-univariate-outliers-in-a-longitudinal-dataset – AMS Jul 22 '19 at 02:15

0 Answers0