1

I'm working on a project which uses Multidimensional Scaling to try and group politicians together based on voting records. My goodness of fit is high; however, I want to plot the MDS coordinates with the names of the politicians so I can draw conclusions from the computation. I am using the wordcloud library for this.

I am attempting to use regex in R using the stringr package to extract the names of the politicians from my "names" vector, the names vector contains some non-standard characters. My goal is to extract the last name and the characters in the square brackets. There are 3 different ways the names look and they are below:

  • Sen. Mike Lee [R]
  • Sen. Chris Coons [D, 2010-2020]
  • Sen. Charles “Chuck†Grassley [R]

From the stringr package I am running this code:

str_extract("\\w+\\s\\[.+\\]$", names)  # names is the vector of names

I get this error:

Error in UseMethod("type") : 
  no applicable method for 'type' applied to an object of class "NULL"

I'm trying to diagnose this error yet can't seem to find anything online to help.

  • firstly convert your names to character by `df$colname<-as.character(df$colname)` – CuriousBeing Feb 29 '16 at 11:29
  • 1
    First, you confused the args order in `str_replace`: it must be `str_extract(names, "\\w+\\s\\[.+\\]$")`.Second, you will get `[1] "Lee [R]" "Coons [D, 2010-2020]" "Grassley [R]"`. Third, what result do you expect? – Wiktor Stribiżew Feb 29 '16 at 11:35
  • The result you posted is what I expect, with those elements for each element of my vector. – user2962887 Feb 29 '16 at 11:42
  • Thank you, I feel dumb now since I was mixing up the stringr documentation and other documentation about r regex functions. This solved my problem. Thank you. – user2962887 Feb 29 '16 at 11:47
  • I posted my comment with more details as an answer. Please use `@` + username when writing back a comment, or the user won't get notified. – Wiktor Stribiżew Feb 29 '16 at 12:30

2 Answers2

0

Given

names <- c("Sen. Mike Lee [R]", "Sen. Chris Coons [D, 2010-2020]", "Sen. Charles “Chuck†Grassley [R]")
stringr::str_extract("\\w+\\s\\[.+\\]$", names)  # names is the vector of names
# [1] NA NA NA

and

t(sapply(regmatches(names, regexec(".*\\s(\\w+)\\s\\[(.+)\\]", names)), "[", -1))
#      [,1]       [,2]          
# [1,] "Lee"      "R"           
# [2,] "Coons"    "D, 2010-2020"
# [3,] "Grassley" "R"  

I cannot reproduce your error.

lukeA
  • 53,097
  • 5
  • 97
  • 100
  • Thank you, it works in the RStudio console, but I still can't get it to work in my script for some reason. – user2962887 Feb 29 '16 at 11:38
  • Encoding issues? You should always provide a reproducible example for copy-paste-run. – lukeA Feb 29 '16 at 11:39
  • I'll do that in the future, my error came from mixing up the argument order in str_extract as pointed out in the comments above. Thank you – user2962887 Feb 29 '16 at 11:51
0

You confused the argument order in str_replace: it must be str_extract(names, "\\w+\\s\\[.+\\]$") (that is, names should be the first argument, and the regex must be the second argument. You will get

> str_extract(names, "\\w+\\s\\[.+\\]$")
[1] "Lee [R]"              "Coons [D, 2010-2020]" "Grassley [R]" 

Note that you can remove the escaping symbol from ] since it is not a special regex metacharacter and you can replace .+ with a negated character class [^][]+ to match any one or more characters other than ] and [:

> str_extract(names, "\\w+\\s\\[[^\\]\\[]+]$")
[1] "Lee [R]"              "Coons [D, 2010-2020]" "Grassley [R]"  
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563