147

I am very new to R, and I could not find a simple example online of how to remove the last n characters from every element of a vector (array?)

I come from a Java background, so what I would like to do is to iterate over every element of a$data and remove the last 3 characters from every element.

How would you go about it?

Grace Mahoney
  • 485
  • 1
  • 7
  • 14
LucasSeveryn
  • 5,984
  • 8
  • 38
  • 65

6 Answers6

167

Here is an example of what I would do. I hope it's what you're looking for.

char_array = c("foo_bar","bar_foo","apple","beer")
a = data.frame("data"=char_array,"data2"=1:4)
a$data = substr(a$data,1,nchar(a$data)-3)

a should now contain:

  data data2
1 foo_ 1
2 bar_ 2
3   ap 3
4    b 4
nfmcclure
  • 3,011
  • 3
  • 24
  • 40
  • Funnily, I had to change `-3` to `-0` to get the desired effect! I have a lot of data with dates, like: `"2014-03-27 23:00:00 GMT" "2014-03-31 00:00:00 BST"` - yes, two timezones together, and the as.Date function is returning unexpected results (day earlier for BST dates) - therefore I wanted to remove the timezone stamp, turns out I have to do `-0` and it disappears, together with hours – LucasSeveryn May 01 '14 at 17:55
  • Also consider the strptime function, I haven't used timezones before though. I think it might recognize it. Supposedly "%Z" recognizes time zones. I also removed the sapply function. I forgot how much R likes to vectorize it's functions. – nfmcclure May 01 '14 at 18:03
  • 1
    @LucasSeveryn If you want to convert character time representations to dates taking into account time zones, please edit that into your question. Likely there are better answers that will get you directly to your desired results (such as `strptime`). – Blue Magister May 01 '14 at 18:23
108

Here's a way with gsub:

cs <- c("foo_bar","bar_foo","apple","beer")
gsub('.{3}$', '', cs)
# [1] "foo_" "bar_" "ap"   "b"
Matthew Plourde
  • 43,932
  • 7
  • 96
  • 113
59

Although this is mostly the same with the answer by @nfmcclure, I prefer using stringr package as it provdies a set of functions whose names are most consistent and descriptive than those in base R (in fact I always google for "how to get the number of characters in R" as I can't remember the name nchar()).

library(stringr)
str_sub(iris$Species, end=-4)
#or 
str_sub(iris$Species, 1, str_length(iris$Species)-3)

This removes the last 3 characters from each value at Species column.

userJT
  • 11,486
  • 20
  • 77
  • 88
Blaszard
  • 30,954
  • 51
  • 153
  • 233
  • Could you please explain what is 1 in the last code? `str_sub(iris$Species, 1, str_length(iris$Species)-3)` – Rara Jul 06 '23 at 07:25
15

The same may be achieved with the stringi package:

library('stringi')
char_array <- c("foo_bar","bar_foo","apple","beer")
a <- data.frame("data"=char_array, "data2"=1:4)
(a$data <- stri_sub(a$data, 1, -4))  # from the first to the (last-4)-th character
## [1] "foo_" "bar_" "ap"   "b" 
gagolews
  • 12,836
  • 2
  • 50
  • 75
5

Similar to @Matthew_Plourde using gsub

However, using a pattern that will trim to zero characters i.e. return "" if the original string is shorter than the number of characters to cut:

cs <- c("foo_bar","bar_foo","apple","beer","so","a")
gsub('.{0,3}$', '', cs)
# [1] "foo_" "bar_" "ap"   "b"    ""    ""

Difference is, {0,3} quantifier indicates 0 to 3 matches, whereas {3} requires exactly 3 matches otherwise no match is found in which case gsub returns the original, unmodified string.

N.B. using {,3} would be equivalent to {0,3}, I simply prefer the latter notation.

See here for more information on regex quantifiers: https://www.regular-expressions.info/refrepeat.html

krads
  • 1,350
  • 8
  • 14
0

friendly hint when working with n characters of a string to cut off/replace:

--> be aware of whitespaces in your strings!

use base::gsub(' ', '', x, fixed = TRUE) to get rid of unwanted whitespaces in your strings. i spent quite some time to find out why the great solutions provided above did not work for me. thought it might be useful for others as well ;)

ExploreR
  • 313
  • 4
  • 15