I have a made up dataset of polling stations in Wales and I've attached a date column to it. We can imagine this date is the date this polling station was visited to check the facilities (for example).
What I'd like to do is work out :
I would like to work out whether geographic points are within a certain distance
- This I've managed by self_joining and using st_buffer and st_within to calculate within 1000 m and then calculated the number of neighbours.
and also the interval between the sample dates
- this I'm having a bit of a problem with
What I'd like to do, I think, is for each polling station
- calculate the number of neighbours (so far so easy)
- for each neighbour determine the interval between the sampling dates
- return a spatial object (for plotting in tmaps probably)
Here's some test code that I've got that generates the sf dataset, calculates the number of neighbours and returns that. It's really the date interval that's stumping me. It's not so much the calculation of the date interval but it's the way to generate these clusters of polling stations with date intervals. Is it better to generate the (in this case) 108 polling station clusters?
What I'm trying to do in my larger dataset is calculate clusters of points over time. I have ~2000 records with a date. I'd like to say : for each of these 2000 records calculate the number of neighbours within a distance and within a timeframe.
I think it's probably better to calculate each cluster of neighbouring points and visualise then remove neighbours from the cluster that are outside of the time frame and visualise that
Although, on typing this, I wonder if excluding points that didn't fall within a timeframe first and then calculating neighbours would be more efficient?
polls<-st_as_sf(read.csv(url("https://www.caerphilly.gov.uk/CaerphillyDocs/FOI/Datasets_polling_stations_csv.aspx")),
coords = c("Easting","Northing"),crs = 27700)%>%
mutate(date = sample(seq(as.Date('2020/01/01'), as.Date('2020/05/31'), by="day"), 147))
test_stack<-polls%>%st_join(polls%>%st_buffer(dist=1000),join=st_within)%>%
filter(Ballot.Box.Polling.Station.x!=Ballot.Box.Polling.Station.y)%>%
add_count(Ballot.Box.Polling.Station.x)%>%
rename(number_of_neighbours = n)%>%
mutate(interval_date = date.x-date.y)%>%
subset(select = -c(6:8,10,11,13:18))## removing this comment will summarise the data so that only number of neighbours is returned %>%
distinct(Ballot.Box.Polling.Station.x,number_of_neighbours,date.x)%>%
filter(number_of_neighbours >=2)