5

Given a "template" of a UK postcode, such as "A9 9AA", where "A" is a letter placeholder, and "9" is a number placeholder, I want to generate random postcode strings like "H8 4GB". Letters can be any uppercase letter, and numbers anything from 0 to 9.

So if the template is "AA9A 9AA" then I want strings like "WC1A 9LK". I'm ignoring for now generating "real" postcodes, so I'm not bothered if "WC1A" is a valid outward code.

I've scraped around trying to get functions from the stringi package to work, but the problem seems to be that replacing or matching the "A"s in a template will only replace the first replacement, for example:

 stri_replace_all_fixed("A9 9AA",c("A","A","A"), c("X","Y","Z"), vectorize_all=FALSE)
[1] "X9 9XX"

so it doesn't replace each "A" with each element from the replacement vector (but this is by design).

Maybe there's something in stringi or base R that I've missed - I'd like to keep it in those packages so I don't bloat what I'm working on.

The brute-force method is to split the template, do replacements, paste the result back together but I'd like to see if there's a quicker, naturally vectorised solution.

So to summarise:

foo("A9 9AA") # return like "B6 5DE"
foo(c("A9 9AA","A9 9AA","A9A 9AA")) # returns c("Y6 5TH","D4 8JH","W0Z 3KQ")

Here's a non-vectorised version which relies on constructing an expression and evaluating it...

random_pc <- function(fmt){
    cc = gsub(" ",'c(" ")',gsub("9","sample(0:9,1)",gsub("A","sample(LETTERS,1)",strsplit(fmt,"")[[1]])))
    paste(eval(parse(text=paste0("c(",paste(cc,collapse=","),")"))),collapse="")    
}

> random_pc("AA9 9AA")
[1] "KO6 1AY"
Spacedman
  • 92,590
  • 12
  • 140
  • 224

3 Answers3

4

As I understand, OP wants to randomly create UK POST CODE in specified format. I think sprintf can help like:

sprintf("%s%s %d%d%s", sample(LETTERS,1),sample(LETTERS,1), sample(0:9,1),
                sample(0:9,1), sample(LETTERS,1) )
#1] "BC 59D"

Now, if purpose is to provide the format using 9 and A then step will be to first replace 9 with %d and A with %s.

OPTION#2

Another option can be achieved using paste0 and sapply using a custom function as:

fmt <- "AA 9AA A"
paste0(sapply(strsplit(fmt,""), getCodeText), collapse = "")
#"YF 7OP Z"


#custom function to generate random characters
getCodeText <- function(x){
  retVal = x
  for(i in seq_along(x)){
    if(x[i] == "A"){
      retVal[i] = sample(LETTERS,1)
    }else if(x[i] == "9"){
      retVal[i] = as.character(sample(0:9,1))
    }
  }
  retVal
}
MKR
  • 19,739
  • 4
  • 23
  • 33
  • 1
    I've edited my q to show an answer based on constructing and evaluating an expression like your first option. – Spacedman Apr 06 '18 at 20:57
  • @Spacedman That looks good. Actually I was working on something similar for my option#2. Finally realized that `paste0` with custom function can be quicker. – MKR Apr 06 '18 at 21:07
1

Here's a solution (vectorised the lazy way) that splits the format and then replaces based on character or numeric:

randpc <- Vectorize(function(s){
    s = strsplit(s,"")[[1]]
    NUMS = as.character(0:9)
    nLet = sum(s %in% LETTERS)
    nDig = sum(s %in% NUMS)
    s[s %in% LETTERS] = sample(LETTERS, nLet, replace=TRUE)
    s[s %in% NUMS] = sample(NUMS, nDig, replace=TRUE)
    paste0(s, collapse="")
})

Has the useful side effect of returning a named vector that shows the format string:

> randpc(c("AA9 9AA","A9 9AA"))
  AA9 9AA    A9 9AA 
"QS4 4LW"  "S9 7EU" 

Its also flexible in that it can create postcodes based on another postcode, since it accepts any letter or number in the format string:

> randpc(rep("LA1 4YF",3))
  LA1 4YF   LA1 4YF   LA1 4YF 
"OL2 5OJ" "YK3 3YB" "FV0 1LW" 
Spacedman
  • 92,590
  • 12
  • 140
  • 224
0

I am not sure what counts as brute force, since a split-replace-combine workflow on the strings seemed the most intuitive to me. However, my first attempts were pretty slow with very large numbers of templates. I had also hoped something like stri_replace_all(replacement = sample(LETTERS, 1)) would work but it also only replaces with the same letter.

This is a slightly different approach using stri_replace_first, replacing the first instance of a template character until there are no template characters left. This means I switch the template to be lowercase l for letters and n for numbers, since postcodes are uppercase letters and numbers only (as far as I know). I think the running time is a lot more reasonable (~10 secs) for 100k templates and this also only uses stringi.

library(stringi)

make_postcodes <- function(templates){
  postcodes <- templates
  while (any(stri_detect_regex(postcodes, "l|n"))){
    for (i in 1:length(templates)){
      postcodes[i] <- stri_replace_first_fixed(
        str = postcodes[i],
        pattern = "l",
        replacement = sample(LETTERS, 1)
        )
      postcodes[i] <- stri_replace_first_fixed(
        str = postcodes[i],
        pattern = "n",
        replacement = sample(0:9, 1)
        )
    }
  }
  postcodes
}

make_postcodes("ln nll")
#> [1] "W8 3MX"
make_postcodes(c("ln nll", "ln nll", "lnl nll"))
#> [1] "H1 6TN"  "C5 6YI"  "A3I 2DB"

test = stri_trim_both(stri_rand_strings(100000, sample(5:9, 1), pattern = "[nl\\ ]"))
tictoc::tic("Time to convert 100,000 templates")
x <- make_postcodes(test)
tictoc::toc()
#> Time to convert 100,000 templates: 12.03 sec elapsed
head(test)
#> [1] "lnnl"  "ll l"  "nl n"  "ll  l" "ll l"  "ll n"
head(x)
#> [1] "G91U"  "HU N"  "2Q 7"  "EU  Z" "PD I"  "SM 4"

Created on 2018-04-06 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42