1

I have a made up dataset of polling stations in Wales and I've attached a date column to it. We can imagine this date is the date this polling station was visited to check the facilities (for example).

What I'd like to do is work out :

I would like to work out whether geographic points are within a certain distance

  • This I've managed by self_joining and using st_buffer and st_within to calculate within 1000 m and then calculated the number of neighbours.

and also the interval between the sample dates

  • this I'm having a bit of a problem with

What I'd like to do, I think, is for each polling station

  • calculate the number of neighbours (so far so easy)
  • for each neighbour determine the interval between the sampling dates
  • return a spatial object (for plotting in tmaps probably)

Here's some test code that I've got that generates the sf dataset, calculates the number of neighbours and returns that. It's really the date interval that's stumping me. It's not so much the calculation of the date interval but it's the way to generate these clusters of polling stations with date intervals. Is it better to generate the (in this case) 108 polling station clusters?

What I'm trying to do in my larger dataset is calculate clusters of points over time. I have ~2000 records with a date. I'd like to say : for each of these 2000 records calculate the number of neighbours within a distance and within a timeframe.

I think it's probably better to calculate each cluster of neighbouring points and visualise then remove neighbours from the cluster that are outside of the time frame and visualise that

Although, on typing this, I wonder if excluding points that didn't fall within a timeframe first and then calculating neighbours would be more efficient?

polls<-st_as_sf(read.csv(url("https://www.caerphilly.gov.uk/CaerphillyDocs/FOI/Datasets_polling_stations_csv.aspx")),
                coords = c("Easting","Northing"),crs = 27700)%>%
        mutate(date = sample(seq(as.Date('2020/01/01'), as.Date('2020/05/31'), by="day"), 147))

test_stack<-polls%>%st_join(polls%>%st_buffer(dist=1000),join=st_within)%>%
  filter(Ballot.Box.Polling.Station.x!=Ballot.Box.Polling.Station.y)%>%
  add_count(Ballot.Box.Polling.Station.x)%>%
  rename(number_of_neighbours = n)%>%
  mutate(interval_date = date.x-date.y)%>%
  subset(select = -c(6:8,10,11,13:18))## removing this comment will summarise the data so that only number of neighbours is returned %>%
  distinct(Ballot.Box.Polling.Station.x,number_of_neighbours,date.x)%>%
  filter(number_of_neighbours >=2)
damo
  • 463
  • 4
  • 14
  • You can join the layer with itself, with a *distance* threshold, resulting in all possible pairs up to that threshold. Then calculate 'time_diff' column, and filter to retain only those pairs that are below the *time* threshold too. Will be happy to try and answer with specific code if you can please post a small reproducible sample data and clear definition of what's the result you need (what are the required distance and time thresholds?). – Michael Dorman Jul 30 '20 at 14:40
  • I think I'm going to explore this link – damo Jul 30 '20 at 19:36
  • https://pubmed.ncbi.nlm.nih.gov/31596789/ – damo Jul 30 '20 at 19:37
  • OK. Bit of a dredge. And a bit of time to think about this.The example above works in that you can filter by time and distance. I'm trying to think about the best way to visualise this. At the moment, you'll only see a single point representing the cluster (which you could colour code depending on the number of neighbours). Is it possible, using html/leaflet/tmap, so that when you select a cluster in the interactive map it shows the neighbours? – damo Sep 01 '20 at 20:19

1 Answers1

0

I think it might be as simple as

tm_shape(test_stack)+tm_dots(col = "number_of_neighbours", clustering =T, size = 0.5)

I'm not sure how clustering works in leaflet, but that works quite nicely on this test data.

damo
  • 463
  • 4
  • 14