Questions tagged [stringr]

The stringr package is a wrapper for the R stringi package that provides consistent function names and error handling for string manipulation. It is part of the Tidyverse collection of packages. Use this tag for questions involving the manipulation of strings specifically with the stringr package. For general R string manipulation questions use the R tag together with the generic string tag.

's stringr package provides a more consistent user interface to base-R's string manipulation and regular expression functions.

Repositories

Other resources

Related tags

2501 questions
1
vote
1 answer

How to use str_pad on certain values in a variables using mutate/mutate_if?

I want to pad strings with zeros (on the left) if the number of characters is 2. Let the dataframe be as follows: df<-data.frame(a=c("352","35","54","1"),stringsAsFactors=FALSE) I would like to get df a 1 352 2 035 3 054 4 1 I tried using…
HNSKD
  • 1,614
  • 2
  • 14
  • 25
1
vote
1 answer

Imported CSV into R, Missing Values Become "" instead of NA

I tried this command df<-read.csv("filename.csv",stringsAsFactors=FALSE) For both, num and int variables, missing values are read as NA. However, for chr, missing values are read as "" instead. When I take the command is.na(""), it returns a FALSE.…
HNSKD
  • 1,614
  • 2
  • 14
  • 25
1
vote
1 answer

Extract 2 terms before specific character

I want to extract the two words preceding a Twitter @handle x <- c("this is a @handle", "My name is @handle", "this string has @more than one @handle") Doing the following extracts all the text preceding the last @handle only, I need it for all…
JohnCoene
  • 2,107
  • 1
  • 14
  • 31
1
vote
1 answer

How to extract the last 4 digits of a string of characters in R

I would like to extract the LAST 4 digits in a given string, but can't figure it out. The LAST 4 digits could be "XXXX" or "XXXX-". Ultimately, I have a list of heterogeneous entries that include single years (i.e., 2001- or 2001), lists of years…
1
vote
2 answers

Find first match of a substring in a column of big data.table

I have a big data table, where I want to check if a 103a_foo is present. However, the filenames in a big table they are written differently, so I have to use regex. dt = structure(list(myID = c("86577", "34005","34005", "194000", "30252", "71067"),…
JelenaČuklina
  • 3,574
  • 2
  • 22
  • 35
1
vote
2 answers

Match Observation from One Table to Another Table Variable Consisting of Strings

I have two datasets called A and B. library(data.table) Farm.Type <- c("Fruits","Vegetables","Livestock") Produce.All <- c("Apple, Orange, Pears, Strawberries","Broccoli, Cabbage, Spinach","Cow, Pig, Chicken") Store <-…
Leo
  • 86
  • 1
  • 6
1
vote
1 answer

Search for the first matching text for dictionary terms in R

I have a dictionary with terms terms <- c("hello world", "great job") terms <- as.data.frame(terms) , and i would like to search for the first match in additional data.frame which contains documents doc <- c("i would like to say hello worlds", "hey…
Dmitry Leykin
  • 485
  • 1
  • 7
  • 14
1
vote
0 answers

R: Converting PDF to CSV using pdftools, stringr, and regex

I'm converting a massive collection of pdfs into a single massive csv. A typical pdf looks like this: When I use pdftools to convert the page into a single text string I get this: When I use the cat() function on the page's string I get this: My…
beemyfriend
  • 85
  • 1
  • 11
1
vote
3 answers

consecutive matches in regex (R)

I'm trying to write a regex expression (under R) that matches all the words containing 3 letters in this text: tex= "As you are now so once were we" My first attempt is to select words containing 3 letters surrounded by…
Blofeld
  • 53
  • 5
1
vote
2 answers

mixed dataframe of list of character vectors into uniform dataframe

I am trying to break up strings as columns using the stringr package. > df <- dput(head(facs,3)) structure(list(geo_accession = structure(1:3, .Names = c("V2", "V3", "V4"), .Label = c("GSM1494875", "GSM1494877", "GSM1494879", "GSM1494881",…
seraphim711
  • 137
  • 2
  • 11
1
vote
2 answers

R sets of coordinates extract from string

I'am trying to extract sets of coordinates from strings and change the format. I have tried some of the stringr package and getting nowhere with the pattern extraction. It's my first time dealing with regex and still is a little confusing to create…
aoceano
  • 85
  • 1
  • 13
1
vote
1 answer

Explain the behavior of ```str_match_all``` in R package ```stringr```

st = list("amber johnson", "anhar link ari") t = stringr::str_match_all(st, "(\\ba[a-z]+\\b)") str(t) # List of 2 # $ : chr [1, 1:2] "amber" "amber" # $ : chr [1:2, 1:2] "anhar" "ari" "anhar" "ari" Why are the results repeated like so?
tnabdb
  • 517
  • 2
  • 8
  • 22
1
vote
1 answer

R: Is it possible to split according to various characters with str_split_fixed?

I have a string that I want to divide by various parts. test = c("3 CH • P" ,"9 CH • P" , "2 CH • P" , "2 CH, 5 ECH • V", "3 ECH • V", "4 ECH • P" ) I know that using str_split_fixed() from stringr() I can split the string…
Edu
  • 903
  • 6
  • 17
1
vote
1 answer

Finding Non-Matching Names in Two Different Dataframe Columns Before Joining or Merging

I'm wondering if there's an easy way to compare columns before doing a join in dplyr. Below are two simple dataframes. I want to join based on first and last names, however there are some spelling mistakes or different formats, such as "Elizabeth…
Mike
  • 2,017
  • 6
  • 26
  • 53
1
vote
1 answer

Regex works, but not on strings in my vector

So I am attempting to use grep to find pattern and replace values within my single column data frame. I basically want grep that says "delete everything after the comma until the end of the string". I wrote the expression, and it works on my dummy…
ALW94
  • 23
  • 2