R replacing with gsub using a pattern with a length greater than 1

Question

I have a data.frame example with a variable (care_group) as follows:

> example
      care_group
    1 1st Choice Care Homes 8.8
    2 2Care
    3 229 Mitcham Lane Ltd
    4 3 L Care Ltd
    5 3AB Care Ltd
    6 9Grace Road Ltd
    7 A&R Care Ltd 9.7
    8 ABLE (Action for a Better Life)
    9 A C L Care Homes Ltd
    10 A D L plc
    11 A D R Care Homes Ltd
    12 A G E Nursing Homes Ltd 8

As you may notice, some of my observations are alphanumeric and contain numbers both in the beginning and/or the end name. I know that it is possible to get rid of numeric characters (see for instance here). Yet, I do not know how to remove only some of them. Concretely, remove the numbers contained at the end of the name and keep those in the beginning. I have tried to do so by creating a group with the numbers that I want to remove and try to use gsub.

ratings = c("8", "8.8", "9.7")
example$new_var = with(example, gsub(ratings, " ", care_group))

However I get this warning message:

Warning message:

In gsub(ratings, " ", care_group) :
  argument 'pattern' has length > 1 and only the first element will be used

I wonder whether it is possible to use gsub with a pattern that has a length > 1 or whether someone could propose a more efficient way to tackle with this. Many thanks in advance.

Possible duplicate of [Remove numbers from alphanumeric characters](http://stackoverflow.com/questions/13590139/remove-numbers-from-alphanumeric-characters) — user2100721, Jun 29 '16 at 15:38

score 1 · Accepted Answer · answered Jun 29 '16 at 15:36

1

Better to use an anchor and character class:

# sample of vector with various possibilities
temp <- c(" 7 A&R Care Ltd 9.7", "A C L Care Homes Ltd", "12 A G E Nursing Homes Ltd 8")

gsub(" [0-9.]+$", "", temp)

[1] " 7 A&R Care Ltd"   "A C L Care Homes Ltd"       "12 A G E Nursing Homes Ltd"

In the regular expression

the $ anchors the expression to the end of the text
the "[0-9.]+" says any sequence of numerical characters including "."

answered Jun 29 '16 at 15:36

lmo

37,904
9
56
69

Thanks @Imo. Makes more sense than the solution I had but it does not seem to make any change. – Edu Jun 29 '16 at 15:44
It works on the example data as shown in my answer. What is different in your real data? – lmo Jun 29 '16 at 15:45
It is data that I have obtained by scrapping a web with `rvest`. I think your solution will work eventually but the levels of the variable are different to what it is seen in the console. I´m working on this and I will update my results. Thanks in any case @Imo – Edu Jun 29 '16 at 15:50
1

Yes, it was this problem (some odd levels in the factor). Now it worked. Many thanks @Imo. – Edu Jun 29 '16 at 15:58

R replacing with gsub using a pattern with a length greater than 1

1 Answers1