I web-scrapped multiple store locators, and now I want to find all the brands that each store has. As every brand has slightly different coordinates, I combined the data by creating a custom key (rounding lat and long to 4 decimals and adding the first 5 letters of the store name) but this has obvious limitations and is not good enough for my case.
My next approach should be perfect, but I am stuck. I thought to:
- Create circles of 100m around each point (st_transform and st_buffer based on this answer)
- Check if any of the circles with the same first 5 letters intersect (... group_by(name_key) %>% st_overlaps) and somehow get a unique group_id.
- Combine the data to using the group_id as key
Hope somebody can give me a hand because I am totally stuck.
tt <- tibble(lat = c(41.38702918, 41.386601, 41.38744179, 41.3871449, 41.38896353, 41.38963478, 41.38666041, 41.38598465, 41.3867169, 41.3852132453, 41.38438262),
long = c(2.170086301, 2.16939207, 2.17080332, 2.171450558, 2.167685415, 2.16892721, 2.170950285, 2.171100009, 2.171736392, 2.171215346, 2.170955304),
name = c( "El Corte Inglés", "EL CORTE INGLÉS DIAGONAL (007)", "El Corte Inglés", "El Corte Inglés Barcelona", "Nadons", "Pops And Co", "Bitti", "BITTI", "BITTI", "Bitti", "Bitti"),
brand = c( "b", "s", "c", "e", "e", "c", "m", "c", "s", "e", "b") ) %>%
mutate(name_key = tolower(name),
name_key = iconv(name_key, from="UTF-8",to="ASCII//TRANSLIT"),
name_key = str_remove_all(name_key, '[[:digit:]]'),
name_key = str_remove_all(name_key, '[:punct:]'),
name_key = str_remove_all(name_key, '[:space:]'),
# Remove the common company types from the name
name_key = str_remove_all(name_key, 'ltda'),
name_key = str_remove_all(name_key, 'ltd'),
# Only get the first 5 letters
name_key = str_sub(name_key, end = 5L))
leaflet() %>%
# add different provider tiles
addProviderTiles(
"OpenStreetMap",
# give the layer a name
group = "OpenStreetMap") %>%
addCircleMarkers(data = tt,
radius = 4,
opacity = 0.7,
label = paste(
"Store name: ", tt$name, "<br>",
"name_id: ", tt$name_key,"<br>",
"Brand: ", tt$brand) %>%
lapply(htmltools::HTML) )