Questions tagged [strsplit]

strsplit is a function in R and MATLAB which splits the elements of a character vector around a given delimiter.

strsplit is a function in R (documentation) and MATLAB (documentation), which splits the elements of a character vector into substrings:

# R:  
strsplit(x, split, fixed=FALSE)
% MATLAB
strsplit(x, split);

Splits a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list (R) or cell array (MATLAB), where each list item corresponds to an element of x that has been split.

  • x a character string or vector of character strings to split.
  • split the character string to split x.
    In R, if split is an empty string (""), then x is split between every character.
  • [R only:] fixed if the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.
702 questions
5
votes
1 answer

Ignore case in strsplit in R

I am aware that in grep you can simply use ignore.case = TRUE. However, what about strsplit? You can pass a regular expression as the second argument, but I'm not sure how I make this regular expression case insensitive. Currently, this is what my…
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
5
votes
3 answers

sapply() with strsplit in R

I found this code: string = c("G1:E001", "G2:E002", "G3:E003") > sapply(strsplit(string, ":"), "[", 2) [1] "E001" "E002" "E003" clearly strsplit(string, ":") returns a vectors of size 3 where each component i is a vector of size 2 containing Gi and…
Leonardo
  • 337
  • 2
  • 5
  • 12
5
votes
5 answers

Splitting text column into ragged multiple new columns in a data table in R

I have a data table containing 20000+ rows and one column. The string in each column has different number of words. I want to split the words and put each of them in a new column. I know how I can do it word by word: Data [ , Word1 :=…
user36729
  • 545
  • 5
  • 30
5
votes
3 answers

Splitting a string into new rows in R

I have a data set like below: Country Region Molecule Item Code IND NA PB102 FR206985511 THAI AP PB103 BA-107603 / F000113361 / 107603 LUXE NA PB105 1012701 / SGP-1012701 /…
user3703195
  • 61
  • 1
  • 8
5
votes
2 answers

Split different lengths values and bind to columns

I've got a rather large (around 100k observations) data set, similar to this: data <- data.frame( ID = seq(1, 5, 1), Values = c("1,2,3", "4", " ", "4,1,6,5,1,1,6", "0,0"), stringsAsFactors=F) data …
Leo
  • 121
  • 2
  • 10
5
votes
4 answers

R strsplit before ( and after ) keeping both delimiters

I have a string that looks like the following: x <- "01(01)121210(01)0001" I want to split this into a vector so that i get the following: [1] "0" "1" "(01)" "1" "2" "1" "2" "1" "0" "(01)" "0" "0" "0" "1" The (|) could be [|] or {|} and the number…
5
votes
3 answers

Split strings by commas only if substrings are elements of another vector

I have a set of survey responses where respondents could select zero or more options to answer the question "What types of fruit do you like?". There was also a space for a write-in answer. In the results spreadsheet, each person's response is in…
Kara Woo
  • 3,595
  • 19
  • 31
5
votes
4 answers

Extract a string between patterns/delimiters in R

I have variable names in the form: PP_Sample_12.GT or PP_Sample-17.GT I'm trying to use string split to grep out the middle section: ie Sample_12 or Sample-17. However, when I do: IDtmp <- sapply(strsplit(names(df[c(1:13)]),'_'),function(x)…
user2726449
  • 607
  • 4
  • 11
  • 23
5
votes
3 answers

R: split only when special regex condition doesn't match

How would you split at every and/ERT only when it is not succeded by "/V" inside one word after in: text <- c("faulty and/ERT something/VBN and/ERT else/VHGB and/ERT as/VVFIN and/ERT not else/VHGB propositions one and/ERT two/CDF and/ERT…
alex
  • 1,103
  • 1
  • 14
  • 25
5
votes
3 answers

strsplit in R with a metacharacter

I have a large amount of data where the delimiter is a backslash. I'm processing it in R and I'm having a hard time finding how to split the string since the backslash is a metacharacter. For example, a string would look like…
newRUser
  • 59
  • 1
  • 3
4
votes
3 answers

Split a string and keep delimiter

Lets say I have a string: StringA/StringB/StringC Is there any way I can split this string by the / symbol, but keep it in the returned values: StringA /StringB /StringC
jackahall
  • 400
  • 1
  • 7
4
votes
3 answers

Fast way to parse vector of "continent / country / city" in R

I have a character vector in R with each string composed of "continent / country / city", e.g. x=rep("Africa / Kenya / Nairobi", 1000000) but the " / " is occasionally mistyped without the bracketing spaces as "/" and in some cases the city is also…
Tom Wenseleers
  • 7,535
  • 7
  • 63
  • 103
4
votes
4 answers

Turning a text column into a vector in r

I want to see whether the text column has elements outside the specified values of "a" and "b" specified_value=c("a","b") df=data.frame(key=c(1,2,3,4),text=c("a,b,c","a,d","1,2","a,b") df_out=data.frame(key=c(1,2,3),text=c("c","d","1,2",NA)) This…
Ashti
  • 193
  • 1
  • 10
4
votes
5 answers

Check string pattern for non-unique characters

I've a data frame with two columns: id and gradelist. The value in gradelist column includes a list of grades (separated by ;) with different length. Here's the data: id <- seq(1,7) gradelist <- c("a;b;b", "c;c", "d;d;d;f", …
user9292
  • 1,125
  • 2
  • 12
  • 25
4
votes
4 answers

Matlab strsplit at non-keyboard characters

In this instance I have a cell array of lat/long coordinates that I am reading from file as strings with format: x = {'27° 57'' 21.4" N', '7° 34'' 11.1" W'} where the ° is actually a degree symbol (U+00B0). I want to use strsplit() or some…