0

This is my first question so please excuse the mistakes. I have a dataframe where the address is in one line and has many missing values and several errors.

Address

  • Braemor Drive, Clontarf, Co.Dublin
  • Meadow Avenue, Dundrum
  • Philipsburgh Avenue, Marino
  • Myrtle Square, The Coast

I would like to add a new field "District", if the value of the address contains certain values for example if it contains Marino, Fairview or Clontarf the District should be Dublin 3.

Dublin3 <- c("Marino", "Fairview", "Clontarf")
matches <- unique (grep(paste(Dublin3,collapse="|"), 
DubPPReg$Address, value=TRUE))

Using R, how can I update the value of District where the match is true?

ClareMc
  • 3
  • 1

1 Answers1

1
# I've created example data frame with column Adress
df <- data.frame(Adress = c("Braemor Drive",
                            "Clontarf",
                                "Co.Dublin",
                                "Meadow Avenue",
                                "Dundrum",
                                "Philipsburgh Avenue", 
                                "Marino",
                                "Myrtle Square", "The Coast"))
# And vector Dublin
Dublin3 <- c("Marino", "Fairview", "Clontarf")

# Match names in column Adress and vector Dublin 3
df$District <- ifelse(df$Adress %in% Dublin3, "Dublin 3",FALSE)

    df
               Adress District
1       Braemor Drive    FALSE
2            Clontarf Dublin 3
3           Co.Dublin    FALSE
4       Meadow Avenue    FALSE
5             Dundrum    FALSE
6 Philipsburgh Avenue    FALSE
7              Marino Dublin 3
8       Myrtle Square    FALSE
9           The Coast    FALSE

Instead of FALSE you can choose something else (e.g. NA).

Edited: If your data are in vector

df <- c("Braemor Drive, Churchtown, Co.Dublin",
        "Meadow Avenue, Clontarf, Dublin 14",
        "Sallymount Avenue, Ranelagh", "Philipsburgh Avenue, Marino") 

Which looks like this

df
[1] "Braemor Drive, Churchtown, Co.Dublin"
[2] "Meadow Avenue, Clontarf, Dublin 14"  
[3] "Sallymount Avenue, Ranelagh"         
[4] "Philipsburgh Avenue, Marino"

You can find your maches using grepl like this

match <- ifelse(grepl("Marino|Fairview|Clontarf", df, ignore.case = T), "Dublin 3",FALSE)

and output is

[1] "FALSE"    "Dublin 3" "FALSE"    "Dublin 3"

Which means that one or all of the matching names that you are looking for (i.e. Marino, Fairview or Clontarf) are in second and fourth row in df.

Miha
  • 2,559
  • 2
  • 19
  • 34
  • Thanks very much Miha. Your code example works perfectly when I tried it but I can't make it work on my example. If I have an exact match of the full address "Philipsburgh Avenue, Marino, Dublin 3", it works. But not for one word within the text. – ClareMc Mar 12 '17 at 18:20
  • Define / insert that one word in vector `Dublin3` and than run the code again. – Miha Mar 12 '17 at 18:28
  • When I try Dublin3 <- c("Avenue", "Fairview", "Clontarf") df$District <- ifelse(df$Adress %in% Dublin3, "Dublin 3",FALSE) on the above df example, the results are false for all entries except for "Clontarf" which is an exact match. – ClareMc Mar 12 '17 at 18:56
  • In my data frame (df) there is no `Avenue` but there is `Meadow Avenue`. So if you would also like to match`Avenue` to Dublin 3 you need to insert it into column `Adress`. – Miha Mar 12 '17 at 19:03
  • Hi Miha, I think we're talking at cross purposes here and I should have given a full code example. The dataframe consists of addresses in this format. df <- c("Braemor Drive, Churchtown, Co.Dublin", "Meadow Avenue, Clontarf, Dublin 14", "Sallymount Avenue, Ranelagh", "Philipsburgh Avenue, Marino") And I need to find the ones that include the values from Dublin 3. Dublin3 <- c("Marino", "Fairview", "Clontarf") It needs to pick up any addresses that contain that pattern, not just exact matches. – ClareMc Mar 12 '17 at 19:18
  • I edited my answer please take a look If this is what you need. – Miha Mar 12 '17 at 20:00