Questions tagged [stringi]

stringi is THE R package for fast, correct, consistent and convenient string/text processing in each locale and any native character encoding. The use of the ICU library gives R users a platform-independent set of functions known to Java, Perl, Python, PHP, and Ruby programmers.

's stringi package provides a platform independent way of manipulating strings. It is built on the library and has a syntax inspired by the package.

Repositories

Other resources

Related tags

298 questions
2
votes
2 answers

Replace or remove multiple backslashes with 1 printed pair

How can I replace multiple backslashes with a single one? I know that in a string a single backslash is represented with \\ as demonstrated here: nchar('\\') [1] 1 So I want to replace replace all the backslashes in this string: 'thre\\\\fd' with…
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
2
votes
1 answer

Regex to extract double quotes and string in quotes R

I have a data frame with a column of "text." Each row of this column is filled with text from media articles. I am trying to extract a string that occurs like this: "term" (including the double quotes around the term). I tried the following regular…
mundos
  • 459
  • 6
  • 14
2
votes
2 answers

Insert vertical bar between each character of a string in R

How would I be able to insert a vertical bar in between every character of a string in R? For example, say I have a string "ABC123". How could I obtain the output to be "A|B|C|1|2|3"? If anyone could vectorize this idea for a vector of character…
cgibbs_10
  • 176
  • 1
  • 12
2
votes
2 answers

icudt error while installing stringi library in R

I'm writing this because it took me several days to come to this result. Bottom line: The stringi library version 1.1.3 (released March 2017) might have issues involving icudt. You can install stringi 1.1.2 using the following commands: packageurl…
David Webb
  • 31
  • 1
  • 4
2
votes
2 answers

stringr::str_sub output is unexpected

Consider the folowing data.frame: df <- structure(list(sufix = c("atizado", "atoria", "atório", "auta", "áutico", "ável"), min_stem_len = c(4, 5, 3, 5, 4, 2), replacement = c("", …
Daniel Falbel
  • 1,721
  • 1
  • 21
  • 41
2
votes
1 answer

regex in R "eats" part of the string

I want to split a character string into two groups. The string's structure is pretty simple, yet I haven't been able to make it work. txt <- "text12-01-2016" It's always some letters, followed by a date, and the date, obviously starts with a…
PavoDive
  • 6,322
  • 2
  • 29
  • 55
2
votes
1 answer

Automatic translation of utf-8 into ascii using stringi and stringr in R - Error with escape character \u

I am struggling to translate utf-8 into ascii letters automatically. In a data frame I have the following sequence which originates from greek letters: G By manually converting the sequence…
2
votes
1 answer

how to use R package stringr or stringi to concatenate strings with NAs in data table

I have a data table that have many columns of street address field, like NUM, STREET_PRE, STREETNAME, STREETTYPE,APT_NO, CITY, STATE, ZIP. Many rows don't have values in all columns, like STREET_PRE or APT_NO. I need to get a address string from…
dracodoc
  • 2,603
  • 1
  • 23
  • 33
2
votes
1 answer

sort the strings based on last word in r

sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 7 x64 (build 7601) Service Pack 1 locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 [3]…
Chanti
  • 525
  • 1
  • 5
  • 15
2
votes
1 answer

object 'C_stri_join' not found - Using knitr in Rstudio

When using the knit button in Rstudio I get an error object 'C_stri_join' not found. Here is an example: --- title: "Sample Document" output: html_document: toc: true theme:…
Tom August
  • 31
  • 5
2
votes
1 answer

Split a string based on "^" in R

I need to split and obtain the all the characters before ^ example: I have a column in a dataframe that reads 2567543^ABC 7545435^J 8934939^XY and the result column in the same dataframe should read: 2567543 7545435 8934939 I tried using…