17

I have a character vector containing variable names such as x <- c("AB.38.2", "GF.40.4", "ABC.34.2"). I want to extract the letters so that I have a character vector now containing only the letters e.g. c("AB", "GF", "ABC").

Because the number of letters varies, I cannot use substring to specify the first and last characters.

How can I go about this?

Moose
  • 275
  • 1
  • 2
  • 7

5 Answers5

13

The previous answers seem more complicated than necessary. This question regarding digits also works with letters:

> x <- c("AB.38.2", "GF.40.4", "ABC.34.2", "A B ..C 312, Fd", "  a")
> gsub("[^a-zA-Z]", "", x)
[1] "AB"    "GF"    "ABC"   "ABCFd" "a" 
12

you can try

sub("^([[:alpha:]]*).*", "\\1", x)
[1] "AB"  "GF"  "ABC"
Mamoun Benghezal
  • 5,264
  • 7
  • 28
  • 33
3

This is how I managed to solve this problem. I use this because it returns the 5 items cleanly and I can control if i want a space in between the words:

x <- c("AB.38.2", "GF.40.4", "ABC.34.2", "A B ..C 312, Fd", "  a")

extract.alpha <- function(x, space = ""){      
  require(stringr)
  require(purrr)
  require(magrittr)
  
  y <- strsplit(unlist(x), "[^a-zA-Z]+") 
  z <- y %>% map(~paste(., collapse = space)) %>% simplify()
  return(z)}

extract.alpha(x, space = " ")
cephalopod
  • 1,826
  • 22
  • 31
  • by the way, readr has functions to handle text/character separation, check out readr::parse_number() readr::parse_character() readr::parse_date() – cephalopod Mar 17 '17 at 04:30
2

None of the answers work if you have mixed letter with spaces. Here is what I'm doing for those cases:

x <- c("AB.38.2", "GF.40.4", "ABC.34.2", "A B ..C 312, Fd")
unique(na.omit(unlist(strsplit(unlist(x), "[^a-zA-Z]+"))))

[1] "AB" "GF" "ABC" "A" "B" "C" "Fd"

mimoralea
  • 9,590
  • 7
  • 58
  • 59
2

I realize this is an old question but since I was looking for a similar answer just now and found it, I thought I'd share.

The simplest and fastest solution I found myself:

x <- c("AB.38.2", "GF.40.4", "ABC.34.2")
only_letters <- function(x) { gsub("^([[:alpha:]]*).*$","\\1",x) }
only_letters(x)

And the output is:

[1] "AB"  "GF"  "ABC"

Hope this helps someone!

centaur
  • 21
  • 2