How to remove the first three characters from every row in a column in R

Question

I have a large data set with a column of text, 20K rows. Would like to remove the first x number (e.g. 3) of characters at the beginning of each row in that specific column. Appreciate your assistance.

Thank you very much everyone for your time and assistance. Very helpful! — Shawn, Feb 23 '19 at 21:49

score 15 · Accepted Answer · answered Feb 22 '19 at 23:43

15

You can do it with gsub function and simple regex. Here is the code:

# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)

# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)

answered Feb 22 '19 at 23:43

Istrel

2,508
16
22

Great answer and it worked like a charm. Thank you. – Shawn Mar 06 '19 at 19:38
such a nice great answer. Would you know how to adapt your answer in case we wanted to delete the last three characters instead? – Angelo Apr 20 '23 at 20:56
Use ".{0,3}$" instead of "^.{0,3}". – Istrel Apr 21 '23 at 21:13

score 5 · Answer 2 · answered Feb 22 '19 at 23:52

With the tidyverse we can use str_sub (and some sample fruit text strings) to do this, by directly specifying start and end points:

library(tidyverse)
tbl <- tibble(some_fruit = fruit)
tbl
#> # A tibble: 80 x 1
#>    some_fruit  
#>    <chr>       
#>  1 apple       
#>  2 apricot     
#>  3 avocado     
#>  4 banana      
#>  5 bell pepper 
#>  6 bilberry    
#>  7 blackberry  
#>  8 blackcurrant
#>  9 blood orange
#> 10 blueberry   
#> # … with 70 more rows
tbl %>%
  mutate(chopped_fruit = str_sub(fruit, 4, -1))
#> # A tibble: 80 x 2
#>    some_fruit   chopped_fruit
#>    <chr>        <chr>        
#>  1 apple        le           
#>  2 apricot      icot         
#>  3 avocado      cado         
#>  4 banana       ana          
#>  5 bell pepper  l pepper     
#>  6 bilberry     berry        
#>  7 blackberry   ckberry      
#>  8 blackcurrant ckcurrant    
#>  9 blood orange od orange    
#> 10 blueberry    eberry       
#> # … with 70 more rows

^{Created on 2019-02-22 by the reprex package (v0.2.1)}

Thank you very much for your help. – Shawn Mar 06 '19 at 19:38 — Shawn, Mar 06 '19 at 19:38

score 3 · Answer 3 · answered Feb 23 '19 at 00:57

As usual..so many ways to do things in R!

You can also try ?substring:

lotsofdata <- data.frame(column.1=c("DataPoint1", "DataPoint2", "DataPoint3", "DataPoint4"),
    +                 column2=c("MoreData1","MoreData2","MoreData3", "MoreData4"),
    +                 stringsAsFactors=FALSE)
> head(lotsofdata)
    column.1   column2
1 DataPoint1 MoreData1
2 DataPoint2 MoreData2
3 DataPoint3 MoreData3
4 DataPoint4 MoreData4

> substring(lotsofdata[,2],4,nchar(lotsofdata[,2]))
[1] "eData1" "eData2" "eData3" "eData4"

Or column 1 [,1]

> substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
[1] "aPoint1" "aPoint2" "aPoint3" "aPoint4"

Then just replace it:

x<-substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))

lotsofdata$column.1<-x

> head(lotsofdata)
  column.1   column2
1  aPoint1 MoreData1
2  aPoint2 MoreData2
3  aPoint3 MoreData3
4  aPoint4 MoreData4

Thank you very much for your help. – Shawn Mar 06 '19 at 19:39 — Shawn, Mar 06 '19 at 19:39

How to remove the first three characters from every row in a column in R

3 Answers3

Linked