2

Given a dataset Dat where I have species (SP), Area (AR), and Time (TM) (in POSIXct). I want to subset the data for individuals that were present with Species A, within a half hour prior and after it was recorded, and within the same area, including two adjacent areas (+ and - 1). For example, if species A was present at 1:00 on area 4, I wish to subset all species present from 12:30 to 1:30 in the same day in areas 3,4 and 5. As an example:

SP         TM      AR
B  1-jan-03 07:22  1
F  1-jan-03 09:22  4
A  1-jan-03 09:22  1
C  1-jan-03 08:17  3
D  1-jan-03 09:20  1
E  1-jan-03 06:55  4
D  1-jan-03 09:03  1
E  1-jan-03 09:12  2
F  1-jan-03 09:45  1
B  3-jan-03 09:15  1
A  3-jan-03 10:30  5
F  3-jan-03 07:30  5
F  3-jan-03 10:20  6
D  3-jan-03 10:05  4

The desired result for this dummy table would be:

SP         TM      AR
A  1-jan-03 09:22  1
D  1-jan-03 09:20  1
D  1-jan-03 09:03  1
E  1-jan-03 09:12  2
F  1-jan-03 09:45  1
A  3-jan-03 10:30  5
F  3-jan-03 10:20  6
D  3-jan-03 10:05  4 

Note: Species A appears repeatedly throughout the dataset in any given area ranging from 1-81 ant any given time. On a previous set of post, I broke this question in two, so I could learn how to integrate the codes, but my specifications for the problem were flawed. Many thanks to the users Thelatemail and Jason who provided helpful answers. Subsetting based on co-occurrence within a time window Subsetting neighboring fileds The feedback was:

with(dat,dat[
(
SP=="A" |
Area %in% c(Area[SP=='A']-1, Area[SP=='A'], Area[SP=='A']+1)
) & 
apply(
sapply(Time[SP=="A"],
function(x) abs(difftime(Time,x,units="mins"))<=30 ),1,any
) 
,]
)

Which worked partially, however, it only subsets within the time window, not by area. I think it is caused by issues with POSIXct and using the subset commands, since different times are included in a time window. Would another apply function be necessary for separating that area interval? Any help is much appreciated

Community
  • 1
  • 1
Karl
  • 67
  • 4
  • It would be great if you could insert links to your previous questions. Thanks. – Henrik Sep 10 '13 at 18:23
  • Remove the line `SP=='A' |` and you should have what you need. See if you can say, in words, what each line is doing and follow the logic in the subsetting. It will be a good exercise and will help with your understanding of R (see my [edit](http://stackoverflow.com/questions/18706182/subsetting-neighboring-fieds/18706352?noredirect=1#18706352)) – Justin Sep 10 '13 at 19:22
  • You could rewrite `c(Area[SP=='A']-1, Area[SP=='A'], Area[SP=='A']+1)` as `Area[SP=='A'] + c(-1, 0, 1)` (but I don't think `%in%` works the way you expect). And naming intermediate results would make your code much easier to understand – hadley Sep 10 '13 at 20:19
  • @Justin, do you get `F 2003-01-01 09:22:00 4` included when you run your code? It is close in space, but at the wrong time (A is in Area 5, but on the 3rd), or vice versa: close in time, but on the wrong site (A is seen same time, but is then in Area 1). Could it be the `any` in `apply` that allows for this, i.e. 'any' true time is resulting in an aggregated TRUE, regardless of space for the single TRUE? I apologize in advance if I have messed things up. – Henrik Sep 10 '13 at 20:33
  • I haven't actually run any of the code... – Justin Sep 10 '13 at 20:39
  • @Justin :) I was trying soooooo desperate to create a solution myself, got extreeeemely frustrated. And gave up. So I started to go through your nice answer instead to get some ideas and learn and happened to stumble over this one row...Sorry. – Henrik Sep 10 '13 at 20:47
  • I can provide a larger chunk of the original dataset: – Karl Sep 10 '13 at 20:53
  • If the test data includes the properties that you wish a code should deal with, there is no need for more. I tend to think that the shorter the better. But thanks for the offer! And thanks for providing _both_ a nice little test data set _and_ a desired output, and a clearly formulated question. – Henrik Sep 10 '13 at 21:08

1 Answers1

1

A possible solution very much inspired by @thelatemail's and @Justin's previous, nice answers, but this accounts for time in the boolean expression for space (see my comments to this question).

Using sapply, we 'loop' over each time of registration of Species A (time[SP == "A"]), and create a boolean matrix mm with one column per registration of A. Each row represents a test for space and time for each registration against a given registration of A.

mm <- with(dat,
           sapply(time[SP == "A"], function(x)
             abs(AR - AR[SP == "A" & time == x]) <= 1 &
                    abs(difftime(time, x, units = "mins")) <= 30))

# select rows from data where at least one column in mm is TRUE    
dat[rowSums(mm) > 0, ]

# SP                time AR
# 3   A 2003-01-01 09:22:00  1
# 5   D 2003-01-01 09:20:00  1
# 7   D 2003-01-01 09:03:00  1
# 8   E 2003-01-01 09:12:00  2
# 9   F 2003-01-01 09:45:00  1
# 11  A 2003-01-03 10:30:00  5
# 13  F 2003-01-03 10:20:00  6
# 14  D 2003-01-03 10:05:00  4
Community
  • 1
  • 1
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • Great edit, I noticed you cleaned up the code on the AR conditionals. I was trying the same thing yesterday. Very well made. – Karl Sep 11 '13 at 16:16
  • interestingly, after running the code with the edit, it does not give a warning message, which I believe is related to areas that have no neighbors (for example 1, which would only include 2, but not zero). – Karl Sep 11 '13 at 20:57
  • @Karl, glad to hear that it worked. Sorry for not notifying you about the edit. I was a bit stressed today... – Henrik Sep 11 '13 at 21:07