Questions tagged [stringi]

stringi is THE R package for fast, correct, consistent and convenient string/text processing in each locale and any native character encoding. The use of the ICU library gives R users a platform-independent set of functions known to Java, Perl, Python, PHP, and Ruby programmers.

's stringi package provides a platform independent way of manipulating strings. It is built on the library and has a syntax inspired by the package.

Repositories

Other resources

Related tags

298 questions
0
votes
2 answers

how do I insert a character into a string at a specific location but only counting letters in R

I am trying to create a df where I add a character into a string at a position defined by another column- but I only want the code to count letters and not numbers or other characters while it does this, hopefully the example tables make this…
TinoMass
  • 29
  • 1
  • 5
0
votes
2 answers

edit string text in dataframe variable

I want to tidy up a dataframe and automate the process. Given the following data.frame: library(survival) library(rms) library(broom) library(tidyverse) res.cox <- coxph(Surv(time, status) ~ rcs(age, 3) + sex + ph.ecog + …
user63230
  • 4,095
  • 21
  • 43
0
votes
1 answer

mutate string with condition

Given the following example status <- c("Open", "In Progress", "DevTest", "Stage Test: mw", "Stage Test: customer", "DevDone", "Done") a <- c("Open, Open") b <- c("Open, In Progress, DevTest, DevTest") c <- c("DevTest, Done") d <- c("Done,…
0
votes
2 answers

Efficient way to split a huge string in R

I have a huge string (> 500MB), actually it's an entire book collection in one. I have some meta information in another dataframe, e.g. page numbers, (different) authors and titles. I try to detect the title strings in my huge string and split it by…
Marco
  • 2,368
  • 6
  • 22
  • 48
0
votes
0 answers

How to replace specific words in a string of a character vector in R?

I have this data frame I created using mergeDbSources of Bibliometrix Package. In this dataframe there is one column named "AB_TM" created using termExtraction. The AB_TM column consist of strings of terms (a pair of two words) seperated by…
0
votes
1 answer

Dynamically generate subset column names for a dataframe using for loop

For the following dataframe df: df <- structure(list(id = c("M0000607", "M0000609", "M0000612"), `2021-08(actual)` = c(12.6, 19.2, 8.3), `2021-09(actual)` = c(10.3, 17.3, 6.4), `2021-10(actual)` = c(8.9, 15.7, 5.3), `2021-11(actual)` = c(7.3,…
ah bon
  • 9,293
  • 12
  • 65
  • 148
0
votes
3 answers

r remove keywords in a column

I have a column in my dataframe with words like this. ColA 2-4 Model Group1 Group ACH Group2 Phenols Group1 Group ACH Group2 MONO MHPP Group1 Group ACH Group2 I want to create two additional columns like this: 1) without keywords c("Group1", "Group…
0
votes
1 answer

why are some strings not changing even after removing all whitespaces

I'm trying to make simple conditional swap between two columns,before this i wanted to know which rows will change so i created another column "col3" to monitor this. But it appears it doesn't work all the time for…
Hammao
  • 801
  • 1
  • 9
  • 28
0
votes
1 answer

install stringi on centos 7 without Internet

I have apparently the common problem to install stringe on a Centos CentOS 7.9 without Internet I have just a remote to Cran. It means I have 'stringi_1.5.3.tar.gz' After unziping I get: I get the following error checking for pkg-config...…
maniA
  • 1,437
  • 2
  • 21
  • 42
0
votes
1 answer

change collation priority for accented letters

Faced with the need to imitate the behavior of an old system (from the mainframe era), I need to program an specific collation criteria where the non-ASCII letters get the least priority. I have started writing something like this (works only for…
crestor
  • 1,388
  • 8
  • 21
0
votes
0 answers

how to split texts in an increasing manner?

I have a list of texts read into the software using readtext library. files <-readtext(paste0(wd), "/r/*.pdf", ignore_missing_files = FALSE, text_field = "texts") The 100 pdf files are of different unequal sizes that vary from 6000 to 40000 words.…
0
votes
1 answer

R - creating diverse links for rvest to use

I've encountered a problem with creating proper links to use for data mining later . Let's say the link should look like this: www.domain.com/city/month/week . Each of the data (City, Month etc) is a vector. Cities are Strings, months and weeks are…
0
votes
1 answer

R: Convert wrong display of foreign characters into a correct encoding (double mojibake)

In R, I have vectors like this: TEST <- c("BlAA¶schl, G", "ThAA¶ni, A.") whereby BlAA¶schl schould be Blöschl, and ThAA¶ni should be Thöni. There are similar problems throughout a whole dataset. I don't know how it is termed (maybe "non-ASCII…
anpami
  • 760
  • 5
  • 17
0
votes
2 answers

Split string based on condition in r

I'm working with a table that looks like this: library(tidyverse) id <- c(1, 1, 2, 2) year <- rep(1990:1991, 2) occ <- c("former farmer carpenter", "cleaner janitor", "carpenter", "carpenter former cleaner") old_occ <- c("former farmer", "cleaner",…
johnny
  • 423
  • 3
  • 10
0
votes
2 answers

One column transformation with multiple conditions with regular expressions

I have a dataframe: ID value 1 he following object is masked from ‘package:purrr’ 2 Attaching package: ‘magrittr’ 3 package ‘ggplot2’ was built under R version 3.6.2 4 Warning messages: here is a code to transform a column…
user13467695