I have a large data set with a column of text, 20K rows. Would like to remove the first x number (e.g. 3) of characters at the beginning of each row in that specific column. Appreciate your assistance.
Asked
Active
Viewed 2.0k times
3 Answers
15
You can do it with gsub
function and simple regex. Here is the code:
# Fake data frame
df <- data.frame(text_col = c("abcd", "abcde", "abcdef"))
df$text_col <- as.character(df$text_col)
# Replace first 3 chracters with empty string ""
df$text_col <- gsub("^.{0,3}", "", df$text_col)

Istrel
- 2,508
- 16
- 22
-
-
such a nice great answer. Would you know how to adapt your answer in case we wanted to delete the last three characters instead? – Angelo Apr 20 '23 at 20:56
-
5
With the tidyverse
we can use str_sub
(and some sample fruit
text strings) to do this, by directly specifying start and end points:
library(tidyverse)
tbl <- tibble(some_fruit = fruit)
tbl
#> # A tibble: 80 x 1
#> some_fruit
#> <chr>
#> 1 apple
#> 2 apricot
#> 3 avocado
#> 4 banana
#> 5 bell pepper
#> 6 bilberry
#> 7 blackberry
#> 8 blackcurrant
#> 9 blood orange
#> 10 blueberry
#> # … with 70 more rows
tbl %>%
mutate(chopped_fruit = str_sub(fruit, 4, -1))
#> # A tibble: 80 x 2
#> some_fruit chopped_fruit
#> <chr> <chr>
#> 1 apple le
#> 2 apricot icot
#> 3 avocado cado
#> 4 banana ana
#> 5 bell pepper l pepper
#> 6 bilberry berry
#> 7 blackberry ckberry
#> 8 blackcurrant ckcurrant
#> 9 blood orange od orange
#> 10 blueberry eberry
#> # … with 70 more rows
Created on 2019-02-22 by the reprex package (v0.2.1)

Calum You
- 14,687
- 4
- 23
- 42
3
As usual..so many ways to do things in R!
You can also try ?substring
:
lotsofdata <- data.frame(column.1=c("DataPoint1", "DataPoint2", "DataPoint3", "DataPoint4"),
+ column2=c("MoreData1","MoreData2","MoreData3", "MoreData4"),
+ stringsAsFactors=FALSE)
> head(lotsofdata)
column.1 column2
1 DataPoint1 MoreData1
2 DataPoint2 MoreData2
3 DataPoint3 MoreData3
4 DataPoint4 MoreData4
> substring(lotsofdata[,2],4,nchar(lotsofdata[,2]))
[1] "eData1" "eData2" "eData3" "eData4"
Or column 1 [,1]
> substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
[1] "aPoint1" "aPoint2" "aPoint3" "aPoint4"
Then just replace it:
x<-substring(lotsofdata[,1],4,nchar(lotsofdata[,1]))
lotsofdata$column.1<-x
> head(lotsofdata)
column.1 column2
1 aPoint1 MoreData1
2 aPoint2 MoreData2
3 aPoint3 MoreData3
4 aPoint4 MoreData4

OctoCatKnows
- 399
- 3
- 17