4

I would like to subset a data frame in order to keep only observations where the seconds are an even number.

You can download a small part of my data here (100 rows).

The first 6 rows look like this:

            Timestamp C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 C11 C12 C13 C14
1 2013-04-01 00:00:00   0   1   1   1   1   0   1   1   1   1   0   1   0   1
2 2013-04-01 00:00:01   0   1   1   1   1   0   1   1   1   1   0   1   0   1
3 2013-04-01 00:00:02   0   1   1   1   1   0   1   1   1   1   0   1   0   1
4 2013-04-01 00:00:03   0   1   1   1   1   0   1   1   1   1   0   1   0   1
5 2013-04-01 00:00:04   0   1   1   1   1   0   1   1   1   1   0   1   0   1
6 2013-04-01 00:00:05   0   1   1   1   1   0   1   1   1   1   0   1   0   1

And I would like it to look like this:

            Timestamp C01 C02 C03 C04 C05 C06 C07 C08 C09 C10 C11 C12 C13 C14
1 2013-04-01 00:00:00   0   1   1   1   1   0   1   1   1   1   0   1   0   1
2 2013-04-01 00:00:02   0   1   1   1   1   0   1   1   1   1   0   1   0   1
3 2013-04-01 00:00:04   0   1   1   1   1   0   1   1   1   1   0   1   0   1
4 2013-04-01 00:00:06   0   1   1   1   1   0   1   1   1   1   0   1   0   1
5 2013-04-01 00:00:08   0   1   1   1   1   0   1   1   1   1   0   1   0   1
6 2013-04-01 00:00:10   0   1   1   1   1   0   1   1   1   1   0   1   0   1

I understand how to subset time intervals from here and here, but I haven't been able to find an example that is similar to my question, and frankly, I have no idea where to start.

Thank you!

Note: The Timestamp variable has already been formatted to POSIXct.

Community
  • 1
  • 1
americo
  • 1,013
  • 8
  • 17

4 Answers4

5

I'm adding an answer because, although all answers are nice, none of them acknowledged the fact that POSIXct object, when converted to integers, are in fact expressed in seconds (from the 1st of january 1970), so really the following works as well (but wouldn't if you were trying to picks odd and even minutes, hours,...):

a <- seq(as.POSIXct("2013-04-01 00:00:00"),as.POSIXct("2013-04-01 01:00:00"),by="secs")
a[as.integer(a)%%2==0]
plannapus
  • 18,529
  • 4
  • 72
  • 94
4
library(lubridate)

foo <- seq(as.POSIXct("2013-01-10"), as.POSIXct("2013-01-11"), by = "secs")

secs <- second(foo)

even <- foo[secs %% 2 == 0]
odd <- foo[secs %% 2 == 1]

Your download link wasn't working for me so I didn't use your data, but you should be able to subset your data.frame in the same way.

Jake Burkhead
  • 6,435
  • 2
  • 21
  • 32
  • I am working with many months, however only one month at a time, is there a way to define foo so that it can accommodate the different months? Years? – americo Oct 29 '13 at 18:43
  • If you're asking how to subset based on even and odd months/years then you should look at `?month` and `?year` in `lubridate` and replace second with one of those functions – Jake Burkhead Oct 29 '13 at 18:50
  • Hi Jake, I was wondering if there was a way to work independent of months/years and still subset by seconds. Mariam's answer is exactly right for my needs, but thank you for your help. – americo Oct 29 '13 at 18:53
  • @ChelseaE I believe this solution is independent of months/years. Am I missing something? – Jake Burkhead Oct 29 '13 at 18:56
  • Hi Jake, from what I understand I would have to specify an upper and lower time limit, and my dataset(s) timestamp's vary a lot. I guess I could extract the first and last date from each dataset, but at this point your solution gets, I feel, too involved for my needs. This may change when/if @hadley clarifies his comment. – americo Oct 30 '13 at 14:04
  • @ChelseaE The only upper and lower time limits in my code are from creating toy data to test this on. If you already have data you wouldn't need to create data, so you wouldn't need to specify any limits – Jake Burkhead Oct 30 '13 at 14:23
4

A base alternative:

tt <- c(Sys.time(), Sys.time() + 1)
tt
# [1] "2013-10-29 19:43:26 CET" "2013-10-29 19:43:27 CET"

tt[as.numeric(format(tt, "%S")) %% 2 == 0]
# [1] "2013-10-29 19:43:26 CET"

Update with a faster alternative thanks to @Roland

tt[round(as.POSIXlt(tt)$sec) %% 2 == 0]
Henrik
  • 65,555
  • 14
  • 143
  • 159
1

Without using any external package, you could do (for even)

res =  df[(as.numeric(substr(df$Timestamp, 18, 19)) %% 2) == 0,]

For testing purposes, I used a small subset of your dataframe:

df = data.frame(Timestamp = c("2013-04-01 00:00:00", "2013-04-01 00:00:01", "2013-04-01 00:00:02", "2013-04-01 00:00:03", "2013-04-01 00:00:04"), C01 = rep(0,5), C02 = rep(1,5))
df$Timestamp = as.POSIXct(df$Timestamp)

Here is what you obtain (for even):

#> res
#            Timestamp C01 C02
#1 2013-04-01 00:00:00   0   1
#3 2013-04-01 00:00:02   0   1
#5 2013-04-01 00:00:04   0   1

For odd, the same logic is applied by replacing ==0 by ==1

Mayou
  • 8,498
  • 16
  • 59
  • 98
  • My pleasure!! You can use different limits for `substr` if you wish to extract month or year instead! – Mayou Oct 29 '13 at 18:51
  • @hadley As you can imagine I am fairly new to R, could you elaborate on why one wouldn't want to do string manipulations on dates? Should I post this as a separate question? – americo Oct 30 '13 at 13:56
  • @hadley I really don't see what the problem is. Since class POSIXct contains date-time information in a structured manner, you can rely on substr to extract the characters in time positions within the POSIXct vector. – Mayou Oct 30 '13 at 14:53
  • 3
    It's always a bad idea to treat data of one type like it's data of another - it might work 90% of the time, but eventually it will fail disastrously and you'll spend hours trying to debug it. – hadley Oct 30 '13 at 16:52