Questions tagged [stringr]

The stringr package is a wrapper for the R stringi package that provides consistent function names and error handling for string manipulation. It is part of the Tidyverse collection of packages. Use this tag for questions involving the manipulation of strings specifically with the stringr package. For general R string manipulation questions use the R tag together with the generic string tag.

's stringr package provides a more consistent user interface to base-R's string manipulation and regular expression functions.

Repositories

Other resources

Related tags

2501 questions
5
votes
2 answers

stringr - remove multiple spaces, but keep linebreaks (\n, \r)

I am working on some raw text and want to replace all multiple spaces with one space. Normally, would use stringr's str_squish, but unfortunately it also removes linebreaks (\n and \r) which I have to keep. Any idea? Below my attempts. Many…
zoowalk
  • 2,018
  • 20
  • 33
5
votes
2 answers

Error: "argument is not an atomic vector; coercing[1] FALSE"

I'm new to R and am having trouble (1) generalizing previous stack overflow answers to my situation, and (2) understanding R documentation. So I turn to this community and hope someone will walk me through. I have this code where data1 is a text…
bob
  • 117
  • 1
  • 1
  • 8
5
votes
2 answers

Error in reading Chinese in txt: corpus() only works on character, corpus, Corpus, data.frame, kwic objects

I try to produce a wordcloud and obtain word frequency for a Chinese speech using R, jiebaR and corpus, but cannot make a corpus. Here is my code: library(jiebaR) library(stringr) library(corpus) cutter <- worker() v36 <- readLines('v36.txt',…
ronzenith
  • 341
  • 3
  • 11
5
votes
2 answers

Compare two strings and look for differences and display them for easy viewing in R (similar to git diff)?

Suppose I have two rather long (>100k character) strings which are mostly identical but differ in some locations. Git has the concept of a 'diff', which shows only the differences between two (text) files. Is there anything similar in R, where I can…
stevec
  • 41,291
  • 27
  • 223
  • 311
5
votes
1 answer

str_detect for multiple patterns

I am using str_detect within the stringr package and I am having trouble searching a string with more than one pattern. Here is the code I am using, however it is not returning anything even though my vector ("Notes-Title") contains these…
SteveM
  • 213
  • 3
  • 13
5
votes
1 answer

Split string to columns based on paragraph ending from ocr'd image

I'm working on a project to convert type-writer written War Diary notes into text, from PDF scans. I can successfully (maybe 90% with original non-re-sized file) extract the main text, which I crop first. Reprex data: You could try this from the…
Corey Pembleton
  • 717
  • 9
  • 23
5
votes
1 answer

R - why does str_detect return a different result than grepl when using word boundary on 'words' ending with dash

The help page for str_detect states "Equivalent to grepl(pattern, x)", however: str_detect("ALL-", str_c("\\b", "ALL-", "\\b")) [1] FALSE While grepl(str_c("\\b", "ALL-", "\\b"), "ALL-") [1] TRUE I imagine one of these is not working as intended?…
5
votes
4 answers

Handling empty strings in string detection

I would like to use str_detect and not convert "" to another string pattern. Is there an easy way to deal with empty string patterns "" which right now generates a warning. I would like this to produce TRUE, FALSE, FALSE, FALSE, FALSE library(…
MatthewR
  • 2,660
  • 5
  • 26
  • 37
5
votes
3 answers

Export csv with ISO-8859-1 encoding instead of UTF-8

I struggle with encoding in csv exports. I'm from the Netherlands and we use quite some trema's (e.g. ë, ï) and accents (e.g. é, ó) etc. This causes troubles when exporting to csv and open file in excel. On macOS Mojave. I've tried multiple encoding…
Tdebeus
  • 1,519
  • 5
  • 21
  • 43
5
votes
2 answers

Problem using dplyr on tibbles with vector elements [list columns]

I am running into some problems doing text processing using dplyr and stringr functions (specifically str_split()). I think I am misunderstanding something very fundamental about how to use dplyr correctly when dealing with elements that are…
Angelo
  • 2,936
  • 5
  • 29
  • 44
5
votes
3 answers

Extracting numbers from text with stringr and regex in R

I have a problem where I'm trying to extract numbers from a string containing text and numbers and then create two new columns showing the Min and Max of the numbers. For example, I have one column and a string of data like this: Text Section…
Seth Brundle
  • 160
  • 7
5
votes
5 answers

dplyr mutate a variable by comparing a variable and vectors of different sizes

I have the dataframe of the following type df <- tibble::tribble(~x, c("A", "B"), c("A", "B", "C"), c("A", "B", "C", "D"), c("A", "B")) and vectors like…
Geet
  • 2,515
  • 2
  • 19
  • 42
5
votes
4 answers

Regex with Chinese characters

I'm searching text_ which is: 本周(3月25日-3月31日),国内油厂开机率继续下降,全国各地油厂大豆压榨总量1456000吨(出粕1157520吨,出油262080吨),较上周的...[continued] crush <- str_extract(string = text_, pattern = perl("(?<=量).*(?=吨(出粕)")) meal <- str_extract(string = text_, pattern =…
Rafael
  • 3,096
  • 1
  • 23
  • 61
5
votes
1 answer

Remove everything after last space with stringr

I have data that looks like this: df <- tribble( ~name, ~value, "Jake Lake MLP", 10, "Bay May CE", 5, "Drake Cake Jr. DSF", 9.1, "Sam Ram IR QQQZ", 1 ) I want to trim all the names so that they are: "Jake Lake", "Bay May",…
emehex
  • 9,874
  • 10
  • 54
  • 100
5
votes
2 answers

str_extract_all returns a list but I want a vector

Still relatively new to R here. I have a column of tweets, and I'm trying to create a column that contains the retweet handle "RT @blahblah", like this: Tweets Retweetfrom RT @john I had a good day RT @john RT…
gogolaygo
  • 199
  • 1
  • 12