0

I'm using the Rrd package for R and I'm importing an rrd file, and I wish to delete all record that have NaN as a result.

 head(rra)

                timestamp curr_proc_units entitled_cycles capped_cycles
1480982460 2016-12-05 18:01:00             NaN             NaN           NaN
1480982520 2016-12-05 18:02:00             NaN             NaN           NaN
1480982580 2016-12-05 18:03:00             NaN             NaN           NaN
1480982640 2016-12-05 18:04:00             NaN             NaN           NaN
1480982700 2016-12-05 18:05:00             NaN             NaN           NaN
1480982760 2016-12-05 18:06:00             NaN             NaN           NaN
       uncapped_cycles
1480982460             NaN
1480982520             NaN
1480982580             NaN
1480982640             NaN
1480982700             NaN
1480982760             NaN

The head is all NaN but the rest are not.

#!/usr/bin/env Rscript

# libraries
library(lubridate, quietly = TRUE)
library(plyr, quietly = TRUE)
library(dplyr, quietly = TRUE)
library(chron, quietly = TRUE)
library(ggplot2, quietly = TRUE)
library(Rrd, quietly = TRUE)
library(plyrmr, quietly = TRUE)

rra = importRRD("/kathryn/rdc1vsip8/rdc1vsiphmc3/rdc1vpc1lpr56.rrm", "AVERAGE", 1480982400, 1486598400, 2)

rra$timestamp <- as.POSIXct(as.numeric(rra$timestamp), origin = "1970-01-01")

rra = rra[!is.nan(rra)];

My error is: Error in is.nan(rra) : default method not implemented for type 'list'

So how do I convert my list into something which I can take out the NaN values?

Jaap
  • 81,064
  • 34
  • 182
  • 193
  • I'm assuming you want to pass certain columns into is.nan. If there is one column that will work to identify the nan cases then you can just change the is.nan(rra) to is.nan(rra$yourcolumn) but you'll probably want to use row indexing so it should look like `rra <- rra[!is.nan(rra$yourcolumn), ]` (note the comma) – Dason Feb 24 '17 at 15:25
  • It's all columns apart from timestamp. Would that be possible? – Kathryn Withers Feb 24 '17 at 15:27
  • You could do something like apply(rra[,-1], 1, function(x){any(is.nan(x))}) to get an index for the rows that contain nan values. The "-1" tells it to exclude the first column (which is the timestamp) when applying the function to each row. – Dason Feb 24 '17 at 15:36

2 Answers2

0

Fixed with @Dason answer above rra <- rra[!is.nan(rra$yourcolumn), ] after I specified one column it read the others as well. Thank you for the help.

Jaap
  • 81,064
  • 34
  • 182
  • 193
0

Here's a reproducible version of your dataset.

timestamps <- seq(Sys.time() - 3600, Sys.time(), by = "1 min")
n <- length(timestamps)
rra <- data.frame(
  timestamp = timestamps,
  curr_proc_units = runif(n),
  entitled_cycles = runif(n)
)
rra <- within(
  rra,
  {
    curr_proc_units[sample(n, 10)] <- NaN
    entitled_cycles[sample(n, 10)] <- NaN
  }
)

Here's a solution using dplyr's filter() function.

library(dplyr)
rra %>% 
  filter(
    !is.nan(curr_proc_units),
    !is.nan(entitled_cycles)
  )
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360