Questions tagged [strsplit]

strsplit is a function in R and MATLAB which splits the elements of a character vector around a given delimiter.

strsplit is a function in R (documentation) and MATLAB (documentation), which splits the elements of a character vector into substrings:

# R:  
strsplit(x, split, fixed=FALSE)
% MATLAB
strsplit(x, split);

Splits a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list (R) or cell array (MATLAB), where each list item corresponds to an element of x that has been split.

  • x a character string or vector of character strings to split.
  • split the character string to split x.
    In R, if split is an empty string (""), then x is split between every character.
  • [R only:] fixed if the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.
702 questions
3
votes
1 answer

Split long string by a vector of words

I'm looking to split some television scripts into a data frame with two variables: (1) spoken dialogue and (2) speaker. Here is the sample data: http://www.buffyworld.com/buffy/transcripts/127_tran.html Loaded to R via: require(rvest) url <-…
violag
  • 35
  • 5
3
votes
2 answers

str_split and str_trim and simplify in R

This is for people using stringr in R. I want to split a name into first and trim any stray spaces. > str_trim(str_split(c("John Smith"),"\\s+")) [1] "c(\"John\", \"Smith\")" Where are all the escaped "s coming from? I was expecting…
3
votes
3 answers

How to split a character vector based on a numeric vector for positions

I would like to split a character vector into substrings based on a second numeric vector for the splitting points vec <-…
Assa Yeroslaviz
  • 590
  • 1
  • 9
  • 22
3
votes
2 answers

How do I split a txt file by html tags or regex in order to save it as separate txt files in R?

I have the output of a LexisNexis batch download of news articles in both html and txt format. The file itself contains the headers, metadata, and body of several different news articles that I need to systematically separate and save as independent…
3
votes
3 answers

Regexes works on their own, but not when used together in strsplit

I'm trying to split a string in R using strsplit and a perl regex. The string consists of various alphanumeric tokens separated by periods or hyphens, e.g "WXYZ-AB-A4K7-01A-13B-J29Q-10". I want to split the string: wherever a hyphen…
ApproachingDarknessFish
  • 14,133
  • 7
  • 40
  • 79
3
votes
1 answer

Split string by space except what's inside parentheses

I have the following string: x <- "(((K05708+K05709+K05710+K00529) K05711),K05712) K05713 K05714 K02554" # [1] "(((K05708+K05709+K05710+K00529) K05711),K05712) K05713 K05714 K02554" and I want to split it by space delimiter avoiding what's inside…
IgnacioF
  • 55
  • 5
3
votes
1 answer

splitting strings in R with backslash

I am trying to parse out a file address and want to extract both the file location and the file name. For example, I want this: "C:\Users\carriebrown\Desktop\test\Project_8754.csv" to become this: "C:\Users\carriebrown\Desktop\test\" and…
Carrie Brown
  • 83
  • 1
  • 4
3
votes
1 answer

Split String without losing character- R

I have two columns in a much larger dataframe that I am having difficult splitting. I have used strsplit in past when I was trying to split using a "space", "," or some other delimiter. The hard part here is I don't want to lose any information AND…
Sam Marshal
  • 85
  • 2
  • 6
3
votes
1 answer

Split string according to commas in R

I have the following: s <- "abc, xyz, poi (cv, r2, 44, rghj), wer" How can I split it so the end result is: c("abc", "xyz", "poi (cv, r2, 44, rghj)", "wer") Basically, strsplit the string at every comma, but outside the parentheses.
dimitris_ps
  • 5,849
  • 3
  • 29
  • 55
3
votes
1 answer

strsplit by spaces greater than one in R

Given a string, mystr = "Average student score 88" I wish to split if there are more than 1 space. I wish to obtain the following: "Average student score" "88" I searched that "\s+" will split by any number of spaces. strsplit(mystr,…
user2498497
  • 693
  • 2
  • 14
  • 22
3
votes
1 answer

Change column headings in R, with alternating blanks and names (for genalex format)

I have a data frame called genalex, because I am trying to put my genetic data into the common "genalex" format. I just used the strsplit function in R, to split columns, and now I have this: > genalex[1:5,1:10] Ind V1 V2 V3 V4 V5 V6 V7 V8 V9 1…
user3545679
  • 181
  • 1
  • 12
3
votes
2 answers

R: Retrieve data from split string in a column based on value in another column

I have a very large data frame like: df = data.frame(nr = c(3,3,4), dependeny = c("6/3/1", "9/3/1", "5/4/4/1"), token=c("Trotz des Rückgangs", "Trotz meherer Anfragen", "Trotz des ärgerlichen Unentschiedens")) nr dependeny …
Simone
  • 43
  • 3
3
votes
3 answers

R: find if number is within range in a character string

I have a string s where "substrings" are divided by a pipe. Substrings might or might not contain numbers. And I have a test character string n that contains a number and might or might not contain letters. See example below. Note that spacing can…
Alexey Ferapontov
  • 5,029
  • 4
  • 22
  • 39
3
votes
2 answers

Splitting one Column to Multiple R and Giving logical value if true

I am trying to split one column in a data frame in to multiple columns which hold the values from the original column as new column names. Then if there was an occurrence for that respective column in the original give it a 1 in the new column or 0…
Brad
  • 85
  • 12
3
votes
2 answers

Removing string parts between substrings when substrings occur multiple times in R

In a string string="aaaaaaaaaSTARTbbbbbbbbbbSTOPccccccccSTARTddddddddddSTOPeeeeeee" I would like to remove all parts that occur between START and STOP, yielding "aaaaaaaaacccccccceeeeeee" if I try with gsub("START(.*)STOP","",string) this gives…
Tom Wenseleers
  • 7,535
  • 7
  • 63
  • 103