1

I have some text strings containing addresses. While most are one address per string some have multiple addresses such as:

15, 23, & 26, Marshal Road 42, 54, Milne Road

I want to use R to identify these and split them into multiple strings for each address e.g.

15 Marshal Road
23 Marshal Road
26 Marshal Road
42 Milne Road
54 Milne Road

The multi-address strings don't have a predictable structure but there are a few common types e.g:

50 to 67 (inclusive), Stone Street
1 to 4, Privet Drive
44, 46 and 48, High Street
295-299 (odd), Springfield Road
Flats A-H, Church Street
30 and 30a
188 Seven Sisters Road, Roadway at Rear of 178 to 188A (even Nos ) Seven Sisters Road
40, 40A, 44-46 and 49A Tudor Drive

I've been able to identify the string with multiple addresses using:

stringi::stri_count_regex(x, '\\d+') > 1

But I don't have a good method to split them up and match the correct road name to the numbers.

Are there any packages or methods that can help identify and split up complex text strings?

falcs
  • 499
  • 2
  • 8
  • 25

0 Answers0