1

How can I get a word in a specific location from a string?

For instance I want to get the station code in London, UK:

code <- getStationCode("London", region="UNITED KINGDOM")
code

Result:

[1] "UNITED KINGDOM    EGLINTON/LONDOND EGAE               55 02N  007 09W    9   X     T          6 GB"
[2] "UNITED KINGDOM    LONDON/GATWICK A EGKK        03776  51 08N  000 10W   62   X     T          6 GB"
[3] "UNITED KINGDOM    LONDON CITY AIRP EGLC               51 30N  000 03E    5   X     T          6 GB"
[4] "UNITED KINGDOM    LONDON/HEATHROW  EGLL        03772  51 29N  000 27W   24   X     T          6 GB"
[5] "UNITED KINGDOM    LONDON WEA CENTE EGRB        03779  51 30N  000 07W   39   X                7 GB"

For instance I select the second item in the list:

second <- code[2]

I will get:

"UNITED KINGDOM    LONDON/GATWICK A EGKK        03776  51 08N  000 10W   62   X     T          6 GB"

Then how can I get EGKK from that string?

Run
  • 54,938
  • 169
  • 450
  • 748
  • 2
    It seems the data in each vector is tab-separated. If that's from a file, you should try 'read.table' with tab as a separator e.g. http://stackoverflow.com/questions/9764470/r-reading-a-tsv-file-using-specific-encoding If the data is not from a file (e.g. from an API), you can split each string by tab with the `str_split` function in `stringr` i.e. `code <- str_split(code, sep = "\t")`, then select the 3rd element for each item in the resulting list. – Philippe Marchand Jul 31 '16 at 13:52
  • @PhilippeMarchand thanks for spotting that out! – Run Jul 31 '16 at 14:01

1 Answers1

3

We can use str_extract to extract one or more upper case letters followed by regex lookaround (one or more spaces (\\s+) followed by one or more numbers ([0-9]+))

library(stringr)
str_extract(str1, "[A-Z]+(?=\\s+[0-9]+)")
#[1] "EGKK"

If the Station codes are 4 letter strings

str_extract(str1, "\\b[A-Z]{4}\\b")
#[1] "EGKK"

data

str1 <- "UNITED KINGDOM    LONDON/GATWICK A EGKK        03776  51 08N  000 10W   62   X     T          6 GB"
akrun
  • 874,273
  • 37
  • 540
  • 662