Questions tagged [strsplit]

strsplit is a function in R and MATLAB which splits the elements of a character vector around a given delimiter.

strsplit is a function in R (documentation) and MATLAB (documentation), which splits the elements of a character vector into substrings:

# R:  
strsplit(x, split, fixed=FALSE)
% MATLAB
strsplit(x, split);

Splits a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list (R) or cell array (MATLAB), where each list item corresponds to an element of x that has been split.

  • x a character string or vector of character strings to split.
  • split the character string to split x.
    In R, if split is an empty string (""), then x is split between every character.
  • [R only:] fixed if the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.
702 questions
4
votes
2 answers

R strsplit keep last empty element as empty string

R's strsplit drops the last element if "empty" (example 2) but not when occurring first (example 3) or in the middle of the vector to split (example 4). > unlist(strsplit(x = "1,4", split = ",")) #Example 1 [1] "1" "4" > unlist(strsplit(x = ",4",…
user3375672
  • 3,728
  • 9
  • 41
  • 70
4
votes
4 answers

How can the pattern of a string split become the substring itself?

I am cleaning some strings in R and I need to split them to recover information from two substrings that do not belong with each other. The problem is that, there is no real pattern for me to split all the strings with. Rather, I know what the…
Bora Dora
  • 41
  • 3
4
votes
3 answers

Find all unique values in column separated by comma

I have multiple observations of one species with different observers / groups of observers and want to create a list of all unique observers. My data look like this: data <- read.table(text="species observer 1 A,B 1 A,B 1 B,E 1 B,E 1 D,E,A,C,C 1…
Kanoet
  • 143
  • 1
  • 11
4
votes
1 answer

Fast data.table column split to multiple rows based on delimiter

I have a data.table with 3 columns that I want to split the 3rd by a delimiter to multiple rows. My current implementation is: protein.ids <- c("PA0001","PA0001", "PA0002", "PA0002", "PA0002") protein.names <- c("protein A", "protein A", "protein…
4
votes
1 answer

Keep delimiter in Strsplit with regex combinations

I am munging some data that requires me to combine regex functions using strsplit. I have figured out how to split up my string, but am struggling to apply the guidance in this post around keeping delimiters. Here's an example of a string that I'm…
roody
  • 2,633
  • 5
  • 38
  • 50
4
votes
2 answers

strsplit split on either or depending on

Once again I'm struggling with strsplit. I'm transforming some strings to data frames, but there's a forward slash, / and some white space in my string that keep bugging me. I could work around it, but I eager to learn if I can use some fancy either…
Eric Fail
  • 8,191
  • 8
  • 72
  • 128
4
votes
7 answers

Extract different words from a character string in R

I have seen several SO posts that seem to come close to answering this question but I cannot tell if any actually do so please forgive me is this is a duplicate post. I have several dozens of character strings (this a column within a data frame)…
JBauder
  • 91
  • 1
  • 5
4
votes
3 answers

Splitting a string based on a vector of strings in R

I have the following string and vector: temp = "EarthMars Venus & Saturn PlanetsJupiter" searchTerms = c("Earth", "Jupiter", "Mars", "Venus & Saturn Planets", "Neptune") I want to split 'temp' based on the strings in 'searchTerms', so that I get…
4
votes
3 answers

Split all columns in one data frame and create two data frames in R

I have a single data frame (let's call it df) that looks like this: col1 <- c("1/10", "2/30", "1/40", "3/23", "0/17", "7/14") col2 <- c("2/44", "0/13", "4/55", "6/43", "0/19", "2/34") col3 <- c("0/36", "0/87", "3/11", "2/12", "4/33", "0/12") col4 <-…
Sheila
  • 2,438
  • 7
  • 28
  • 37
4
votes
2 answers

R: How to separate values only after the second space

I have a column with different names: X <- c("Ashley, Tremond WILLIAMS, Carla", "Claire, Daron", "Luw, Douglas CANSLER, Stephan") After the second space, it starts the name of the second person. For instance, Ashley, Tremond is a person and…
Natalia P
  • 97
  • 1
  • 7
4
votes
3 answers

Split multiple columns into rows

I'm working with a very raw set of data and need to shape it up in order to work with it. I am trying to split selected columns based on seperator '|' d <- data.frame(id = c(022,565,893,415), name = c('c|e','m|q','w','w|s|e'), score =…
Davis
  • 466
  • 4
  • 20
4
votes
1 answer

Splitting a string using lookahead assertion regex

Here is a string: [1] "5 15 3 23 11 59 44.7 -.263226218521e-03 .488853402202e-11 .000000000000e+01" I need to split it by certain spaces keeping first 7 numbers together, like this: [1] "5 15 3 23 11 59 44.7" "-.263226218521e-03" …
ephemeris
  • 755
  • 9
  • 21
4
votes
1 answer

R: Regex in strsplit (finding ", " followed by capital letter)

Say I have a vector containing some characters that I want to split based on a regular expression. To be more precise, I want to split the strings based on a comma, followed by a space, and then by a capital letter (to my understanding, the regex…
David
  • 9,216
  • 4
  • 45
  • 78
4
votes
4 answers

Assigning results of strsplit to multiple columns of data frame

I am trying to split a character vector into three different vectors, inside a data frame. My data is something like: > df <- data.frame(filename = c("Author1 (2010) Title of paper", "Author2 et al (2009) Title of…
iNyar
  • 1,916
  • 1
  • 17
  • 31
4
votes
3 answers

R: split string into numeric and return the mean as a new column in a data frame

I have a large data frame with columns that are a character string of numbers such as "1, 2, 3, 4". I wish to add a new column that is the average of these numbers. I have set up the following example: set.seed(2015) library(dplyr) …
dtrain18
  • 53
  • 5