Questions tagged [stringr]

The stringr package is a wrapper for the R stringi package that provides consistent function names and error handling for string manipulation. It is part of the Tidyverse collection of packages. Use this tag for questions involving the manipulation of strings specifically with the stringr package. For general R string manipulation questions use the R tag together with the generic string tag.

's stringr package provides a more consistent user interface to base-R's string manipulation and regular expression functions.

Repositories

Other resources

Related tags

2501 questions
1
vote
1 answer

r split a column in a data frame based on square brackets

I have a data frame: x <- data.frame(a = letters[1:7], b = letters[2:8], c = c("bla bla [ text1 ]", "bla bla [text2]", "how how [text3 ]", "wow wow [ text4a ] [ text4b ]", "ba ba [ text5a ][ text5b]", "my text A", "my text B"),…
user3245256
  • 1,842
  • 4
  • 24
  • 51
1
vote
1 answer

R: simple keyword detection

I want to check if any of a set of "keywords" appear in a string. So, for "text" below, the result should be TRUE (or 1), and for text_2 it should be FALSE (or 0). keywords <- c("one", "two", "three", "four") #set of keywords text <- "Blah blah one…
wimlouw
  • 13
  • 3
1
vote
1 answer

Extract youtube video ID from url with R stringr regex

I'm looking to extract only the video id string from a column of youtube links. The stringr function I'm currently using is this: str_extract(data$link, "\\b[^=]+$") This works for most standard youtube links with the id at the end of the url…
Paul Campbell
  • 846
  • 7
  • 9
1
vote
3 answers

How to check if a string is made up entirely of certain string patterns

I have a vector of strings which I need to check to see if they fit a certain criteria. For example, if a certain string, say "34|40|65" is made up entirely of these patterns: c("34", "35", "37", "48", "65"), then I want to return 1, if they string…
cgibbs_10
  • 176
  • 1
  • 12
1
vote
2 answers

Names string preparation for sex impute

I'm new at R and I need to prepare a column of names and then impute sex, but I'm having some problems with the preparation of the strings, specifically this is an example of what I have: Name example: "alberto eduardo etchegaray de la cerda…
1
vote
1 answer

Why am I unable to install the R package stringi?

Problem installing stringi package during R library installation. During the installation of the package, I get an error when I connect to the URL and receive "icudt551.zip". However, the current situation is that if you have the file "icudt551.zip"…
1
vote
2 answers

Finding Abbreviations in Data with R

In my data (which is text), there are abbreviations. Is there any functions or code that search for abbreviations in text? For example, detecting 3-4-5 capital letter abbreviations and letting me count how often they happen. Much appreciated!
Alex
  • 77
  • 1
  • 10
1
vote
1 answer

stringr str_locate_all not returning the proper index in a dplyr string

I'm trying to use str_locate_all to find the index of the third occurrence of '/' in a dplyr chain but it's not returning the correct index. ga.categoryViews.2016 <- ga.data %>% mutate(province = str_sub(pagePath,2,3), index =…
Joseph Noirre
  • 387
  • 4
  • 20
1
vote
1 answer

R: Extracting string if it is an element of a list

I want to dummy-code whether some string is contained in another (which is structured). For example: player <- c("Michael Jordan", "Steve Kerr", "Michael Jordan", "Toni Kukoc") bulls <- c("Jordan, Michael Jeffrey", "Pippen, Scottie; Harper, Ron", …
user6550364
1
vote
1 answer

TextMining in R - Extracting 2 gram for only few terms and 1 gram for rest

text = c('the nurse was extremely helpful', 'she was truly a gem','helping', 'no issue', 'not bad') I want to extract 1-gram token for most words and 2 gram tokens for words such as extremely, no , not For example when I get tokens they should be as…
MysticRenge
  • 373
  • 1
  • 4
  • 13
1
vote
1 answer

Alphabet conversion - Cyrillic to Latin

I have a list of names and surnames written on Cyrillic. head(text, n = 20) unique(clients$RODITEL) 1 2 ЃОРЃИ 3 ALEKSANDAR 4 000000000000 5 ТР4АЈЧЕ 6 …
Prometheus
  • 1,977
  • 3
  • 30
  • 57
1
vote
3 answers

get last part of a string

I would like to get the last substring of a variable (the last part after the underscore), in this case: "myvar". x = "string__subvar1__subvar2__subvar3__myvar" my attempts result in a match starting from the first substring, e.g.…
Henk
  • 3,634
  • 5
  • 28
  • 54
1
vote
2 answers

Extract segment of filename

I'm trying to extract a filename and save the dataframe with that same name. The problem I have is that if the filename for some reason is inside a folder with a similar word, stringr will return that word as well. filename <-…
FilipeTeixeira
  • 1,100
  • 2
  • 9
  • 29
1
vote
2 answers

Extracting multiple strings from poorly defined user input data

I am looking to create a lookup table from data where entries in a column (user_entry) are in different formats and may contain more than one instance per row. # create example dataframe. id <- c(1111,1112,1113,1114) user_entry <-…
lapsel
  • 75
  • 5