Questions tagged [stringi]

stringi is THE R package for fast, correct, consistent and convenient string/text processing in each locale and any native character encoding. The use of the ICU library gives R users a platform-independent set of functions known to Java, Perl, Python, PHP, and Ruby programmers.

's stringi package provides a platform independent way of manipulating strings. It is built on the library and has a syntax inspired by the package.

Repositories

Other resources

Related tags

298 questions
0
votes
1 answer

Automatically extracting Sections (and section Titles) from a file

I need to extract all subsections (for further text analysis) and their title from an .Rmd file (e.g. from 01-tidy-text.Rmd of tidy-text-mining book: https://raw.githubusercontent.com/dgrtwo/tidy-text-mining/master/01-tidy-text.Rmd) All I know…
IVIM
  • 2,167
  • 1
  • 15
  • 41
0
votes
1 answer

replace character according to its byte r

I have vector (x) of countries, one of the countries is Cote d'Ivoire x <- c("c\u00f4te", "côte") I investigate x I realized that the both cote are not the same showNonASCII(x) 1: cte 2: cte iconv(x, to="ASCII//TRANSLIT") [1] "cA?te" "cote"…
Mohamed
  • 95
  • 7
0
votes
1 answer

Transliterating a vector of strings with mixed encodings to latin1

I have vector with names of countries such as x x <- c("c\u00f4te", "côte") showNonASCII(x) 1: cte 2: cte iconv(x, to="ASCII//TRANSLIT") [1] "cA?te" "cote" Encoding(x) [1] "UTF-8" "latin1" I would like to unify them, so how can I…
Mohamed
  • 95
  • 7
0
votes
1 answer

Extract information from string with dots as separator in R

I apologize for possible similar questions, but I just can't find the solution for my problem. So, I have a string with three parts, separated by “.”, for example: a <- "XXX.YY.ZZZ" (the length of strings differ, it could also be a <- "XXXX.Y.ZZ",…
JerryTheForester
  • 456
  • 1
  • 9
  • 26
0
votes
2 answers

Extract only the characters that are between opening and ending parantheses in the start and end of a string in R

I have many strings that all have the following format: mystrings <- c( "(ABFUHIASH)THISISAVERYLONGSTRINGWITHOUTANYSPACES(ENDING)", "(SECONDSTR)YETANOTHERBORINGSTRINGWITHOUTSPACES(RANDOMENDING)", …
motapekog
  • 13
  • 2
0
votes
1 answer

Extract character before and after "/"

I'm trying to extract character before and after "/" with no success. Sentences are: XXXX YYY ZZZ - AV HAHEHRS, 3061 - SDDW ASDA DDSF - SAO JOSE DOS CAMPOS / SP - CEP: 00000-000 Output should be SAO JOSE DOS CAMPOS / SP I'm trying…
0
votes
3 answers

How to remove a character in the dataframe using the stringi package?

I currently have a dataframe of stock KPIs and I would like to remove the "$" character from the data. However, I can only use one line of code in addition to the mandatory usage of the stringi package. Looking at the documentation, the…
0
votes
3 answers

Multiple numbers from one string

I have the following value (and similar formatting in hundreds of thousands of fields): 61.00.62.1 that I would like to use a stringr or stringi and (likely) a regex to turn into 61.0 0.6 2.1 I have been unsuccessfully using the…
BenD
  • 21
  • 2
0
votes
1 answer

pkgdown builds in Ubuntu but not Windows: argument `str` should be a character vector

I've asked this similar question before. I've done more digging and made this question as minimal and reproducible as possible: First I created a new package as described here and built a site with pkgdown. This builds a site as…
joga
  • 207
  • 2
  • 4
  • 10
0
votes
0 answers

R string-based matching of business names

TL;DR I'd like to match two unequal columns where the values contain business names, and I've tried stringdist's amatch using Jaro-Winkler matching to get close, but not nearly close enough. I am wondering if stringi would be useful here - I just…
0
votes
5 answers

R: Regex madness (stringi)

I have a vector of strings that look like this: G30(H).G3(M).G0(L).Replicate(1) Iterating over c("H", "M", "L"), I would like to extract G30 (for "H"), G3 (for "M") and G0 (for "L"). My various attempts have me confused - the regex101.com debugger,…
balin
  • 1,554
  • 1
  • 12
  • 26
0
votes
0 answers

R: How does the regex "\\b"%s+%c("character","...")%s+%"\\b" work?

I was looking for an option to replace multiple patterns and found some answers in the first of below links. One of the suggested answers uses the stringr package. I was interested to check options with stringi and found one in the documentation…
Manuel Bickel
  • 2,156
  • 2
  • 11
  • 22
0
votes
0 answers

Filter text column based on keywords vector

Here is the dput() info structure(list(Text = c("bandwidth issues. issues with vpn", "be more customer focussed reduce prices and offer same deas to existing customers that they use to attract new ones", "be more helpful and provide a better…
Shery
  • 1,808
  • 5
  • 27
  • 51
0
votes
0 answers

string_count and regex in R

I want to use str_count from the stringi package to count special symbols in a string. Something like this: library(stringi) data$var1 <- stri_count(data$var, pattern="[[:punct:]]") I'm getting the following error. Error in stri_count(data$var,…
Prometheus
  • 1,977
  • 3
  • 30
  • 57