Here is how my data frame is currently structured (first 6 rows). The data I used is available here.
ID date sps time pp datetime km
1 2012-06-19 MICRO 2:19 0 2012-06-19 02:19 80
2 2012-06-21 MUXX 23:23 1 2012-06-21 23:23 80
3 2012-07-15 MAMO 11:38 0 2012-07-15 11:38 80
4 2012-07-20 MICRO 22:19 0 2012-07-20 22:19 80
5 2012-07-29 MICRO 23:03 0 2012-07-29 23:03 80
8 2012-08-07 PRLO 2:04 0 2012-08-07 02:04 80
The columns stand for:
ID
: identification numberdate
: date of observationkm
: locationsps
: species codetime
: time of observationpp
: codes if the species (sps
) observed is a predator (1) or prey (0)datetime
: conversion ofdate
andtime
rows toas.POSIXct
format
The question I am trying to answer:
Does the likelihood of a predator (
pp
= 1) being observed increase with the number of times prey (pp
= 0) are observed (e.g. is prey followed by predator more likely than prey followed by prey, etc.) at each location (km
)?
Background:
- For each location (
km
) there is a unique row in my data with the time the image is taken and an identification of whether the photo is of a predator or prey. - There are many photos of predators and prey at each location.
- For each location, observations of predators and prey are made in temporal sequence.
What I am trying to do:
For each location, exhaustively count the number of temporal pairs of observations: prey-prey, prey-predator, predator-prey and predator-predator.
For each location, shuffle (randomize) the observations of pred/prey (i.e. maintain the same total number of pred/prey as observed) and count the number of pairs of observations generated by the shuffle: prey-prey, prey-predator, predator-prey and predator-predator. Record. Calculate the difference between number of observations in step (1) and that found by each shuffle. Repeat 1000 times. This should give me a sense of how likely the original observation of prey-prey, prey-predator, predator-prey, and predator-predator paired sequences are given the observed proportion of pred/prey.
My question:
Assuming a Markov Chain model is the most appropriate way to answer my question, how would I be able to code this in R?
At this point, I believe the R package I should be using is markovchain
, but I don't know how to translate steps 1 and 2 into R code.