Questions tagged [stringr]

The stringr package is a wrapper for the R stringi package that provides consistent function names and error handling for string manipulation. It is part of the Tidyverse collection of packages. Use this tag for questions involving the manipulation of strings specifically with the stringr package. For general R string manipulation questions use the R tag together with the generic string tag.

's stringr package provides a more consistent user interface to base-R's string manipulation and regular expression functions.

Repositories

Other resources

Related tags

2501 questions
1
vote
1 answer

Create Dataframe from pdf to csv based on string

I like to split information of a pdf document based on the presence of colon. A sample is here. Updated PDF with four pages can be downloaded from this link I am attempting the following. After reading the pdf, I am trying to split it by colon.…
S Das
  • 3,291
  • 6
  • 26
  • 41
1
vote
2 answers

Extract variable names using stringr in R

I am trying to extract some variable names and numbers from the following vector and store them into two new variables: unique_strings <- c("PM_1_PMS5003_S_Avg", "PM_2_5_PMS5003_S_Avg", "PM_10_PMS5003_S_Avg", "PM_1_PMS5003_A_Avg",…
philiporlando
  • 941
  • 4
  • 19
  • 31
1
vote
0 answers

Dplyr function that takes columns as parameters and performs several piped steps

I have repetitive code in dplyr that cleans data. df1_final$sumaryczna_kwota_zobowiązań <- df1_final$sumaryczna_kwota_zobowiązań %>% str_replace(",", ".") %>% str_replace_all("\\s", "")%>% as.numeric()…
Jacek Kotowski
  • 620
  • 16
  • 49
1
vote
1 answer

How do I iterate over rows in a dataframe to detect different words and save it in a new column?

I have been trying to write a function or use the apply family to select the rows in a data frame that contain the words I'm looking for and mark them like a tag. A row can have several tags. Can someone please help me, I have been stuck for a…
1
vote
1 answer

Extract certain characters from list and convert them into a character vector

I have a column in my data frame that is a list of characters. This is the column categories str(df) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4 obs. of 3 variables: $ categories:List of 4 ..$ : chr "Tex-Mex" "Mexican" "Fast Food"…
Banjo
  • 1,191
  • 1
  • 11
  • 28
1
vote
1 answer

R regex replace & vector

How does one do str_replace with a "starting with" ^ and a vector? I am trying to remove the prefixes (Mr., Ms., Dr., Capt., etc.) from a list of names, only from the beginning. I have tried: str_replace(name, prefix, ''). This replaces only a few…
Highland
  • 148
  • 1
  • 7
1
vote
1 answer

Parsing out $ factor to numeric in R

I have a dataframe with a variable that is a factor containing $ signs. So the column is something like Revenue: $450, $550, $650 ..etc. I'd like to strip the $ and transform factor to numeric. I tried parsing using methods found on stackoverflow…
D500
  • 442
  • 5
  • 17
1
vote
3 answers

Equivalent function to stringr::word in stringi package

I went through the stringi package manual to find an equivalent to the function word() in the package stringr, but I could not find it. The reason I am looking for it is because I want to set collation options for my locale and stringr doesn't give…
José
  • 921
  • 14
  • 21
1
vote
2 answers

Remove semicolons sequences of differing length with Regex

Given some data: test <- data.frame(strings = c('a;b;c;;;;;;;', 'd;e;f;g;h;i;j;k;l;m', 'n;o;p;q;r;;;;;', ';;;;;;;;;' )) How do I remove all trailing semicolons to get: test <- data.frame(strings = c('a;b;c', 'd;e;f;g;h;i;j;k;l;m', 'n;o;p;q;r', ''…
Rich Pauloo
  • 7,734
  • 4
  • 37
  • 69
1
vote
4 answers

How to replace matches in a string and index each match

A particular string can contain multiple instances of a pattern that I'm trying to match. For example, if my pattern is and my string is "My name is and his name is ", then there are two matches. I want to replace…
bschneidr
  • 6,014
  • 1
  • 37
  • 52
1
vote
1 answer

Spread Strings across Columns

I have the following sample data: df val_str fruit=apple,machine=crane machine=crane machine=roboter fruit=apple machine=roboter,food=samosa df2 fruit machine food apple crane NA NA crane NA NA roboter NA apple NA …
Joshua Zecha
  • 141
  • 1
  • 8
1
vote
2 answers

How to extract multiple substrings in a string using stringr regex

I have this string: mystring <- "HMSC-bm_in_ALL_CELLTYPES.distal" What I want to do is to extract the substring as defined in this bracketing [HMSC-bm]_in_ALL_CELLTYPES.[distal] So in the end it will yield a vector with two values: HMSC-bm and…
littleworth
  • 4,781
  • 6
  • 42
  • 76
1
vote
2 answers

how to extract the first paragraphs from text in dataframe?

Consider this dataframe library(dplyr) library(stringr) mydf <- data_frame(text = c('Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. \nUt enim ad minim veniam, quis…
ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235
1
vote
2 answers

finding the captial letters in the string

I want find the captial letters in the each string and counting how many are there for each string for example t = c("gctaggggggatggttactactGtgctatggactac", "gGaagggacggttactaCgTtatggactacT", "gcGaggggattggcttacG") …
dondapati
  • 829
  • 6
  • 18
1
vote
1 answer

R regex: split a string by combination of \\n [A-z] & [:punct:]

I have a dataframe with character strings that look like this: bla bla.\n14:39:51 info: pyku bla .\n14:39:51 info: \n14:39:51 info: \n14:39:57 Sam: pyk pyk\n14:43:15 on and on \n14:43:59 you get an idea I want to split lines separated…
Kasia Kulma
  • 1,683
  • 1
  • 14
  • 39