Questions tagged [stringi]

stringi is THE R package for fast, correct, consistent and convenient string/text processing in each locale and any native character encoding. The use of the ICU library gives R users a platform-independent set of functions known to Java, Perl, Python, PHP, and Ruby programmers.

's stringi package provides a platform independent way of manipulating strings. It is built on the library and has a syntax inspired by the package.

Repositories

Other resources

Related tags

298 questions
0
votes
1 answer

Replace rules(String pattern matching) in R

I know similar question might have asked in this forum but I feel my requirement is peculiar. I have a data frame with a column with the following values. Below is the just sample and it contains more than 1000 observations Reported Terms "2 Left…
0
votes
0 answers

find and replace text in xml

Trying to edit the value of maxTreeAgeInit="50.0" in an xml file (outline as follows) middle of my xml file (xml version="1.0" encoding="utf-8") of interest:
pepdave
  • 1
  • 1
0
votes
1 answer

How do I extract the second number pairing from a character string?

If I have a column with character variables that look like "1000_D_22", "1002M_26", and "1014_17_2/3/2019", how do I strip the characters so that I get "22", "26", and "17"?
blaze
  • 57
  • 1
  • 3
0
votes
1 answer

replace parts of a string with a vector

I am having problems with replacing parts of a single string with a set of vector replacements, to result in a vector. I have a string tex which is intended to tell a diagram what text to put as the node (and other) labels. So if tex is "!label has…
Steve Powell
  • 1,646
  • 16
  • 26
0
votes
2 answers

Is there an R function for transforming entire df into lower?

I'm setting up a data table & expected to transform all data to be in lower-case, thought it would look neat. How can I do that ?
Mr.KT
  • 23
  • 6
0
votes
0 answers

tokens_replace() only works with stri_trans_general() and not with Encoding()

While playing around with lemmatizing, stopwords removal, stemming etc. for German text, I had problems using the tokens_replace() function in the quanteda package. I found a solution (see code) which seems to work although I do not understand why.…
LeaK
  • 31
  • 7
0
votes
1 answer

How to split a text into a vector, where each entry corresponds to an index value assigned to each unique word?

Let's say I have a document with some text, like this, from SO: doc <- 'Questions with similar titles have frequently been downvoted and/or closed. Consider using a title that more accurately describes your question.' I can then make a dataframe…
Union find
  • 7,759
  • 13
  • 60
  • 111
0
votes
0 answers

Add conditional whitespace after special character and N additional characters

Cleaning the following web scraped data and getting vectors without proper spacing in consistent places: " SharePriceNAVPremium/Discount" "Current$21.26$20.901.72%" "52 Wk Avg$24.41$23.245.05%" "52 Wk High$28.00$25.0518.09%" "52 Wk…
js80
  • 385
  • 2
  • 11
0
votes
2 answers

Add a list column to a dataframe

I have a dataframe with 100 rows I have a column within the dataframe which consists of text. I would like to separate the text column into sentences so that the text column becomes a list of sentences. I am splitting with stringi package function…
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
0
votes
2 answers

String replace ignoring characters

I have the following string: string <- c("ABDSFGHIJLKOP") and list of substrings: sub <- c("ABDSF", "SFGH", "GHIJLKOP") I would like to include < and > after each sub match thus getting: I have tried the following code by…
Nivel
  • 629
  • 4
  • 12
0
votes
3 answers

Splitting coloumn with differing syntax in R

I am having some trouble cleaning up my data. It consists of a list of sold houses. It is made up of the sell price, no. of rooms, m2 and the address. As seen below the address is in one string. Head(DF, 3) Address Price…
Thomas
  • 17
  • 6
0
votes
2 answers

Remove everything before a certain occurrence identified by position in string

I have a string looking like a. I would like to delete everything before the 2nd to last occurrence of the patter === test, === included. a <- "=== test : {abc} === test : {abc} === test : {abc} === test : {aUs*} === dce …
thequietus
  • 129
  • 1
  • 1
  • 6
0
votes
1 answer

Why won't ggplot install properly on my machine after an upgrade?

I've had a problem for a while now in which I can't load the stringi package until I install it clean. This seems to work as long as I'm in a single R session. Then, some time later, maybe when I create a new session or probably after a longer…
Ben Smith
  • 85
  • 2
  • 11
0
votes
3 answers

Use R to read a text file and format extracted data in to a table

I have a text file in the following basic format which repeats a few thousand times: Patient Name- John Smith Number of dx codes: 123 Number of pr codes: 678 Charges: 910 Lorem ipsum dolor sit amet, consectetur adipiscing elit. Duis arcu ipsum,…
user6340762
  • 165
  • 1
  • 3
  • 10
0
votes
1 answer

stri_unescape_unicode() fails on some characters

I have a problem with converting unicode characters in R. I am following this approach, but stri_unescape_unicode from library stringi fails to return correct value in some cases. Let me show an example where the correct value should be word…
pieca
  • 2,463
  • 1
  • 16
  • 34