Questions tagged [strsplit]

strsplit is a function in R and MATLAB which splits the elements of a character vector around a given delimiter.

strsplit is a function in R (documentation) and MATLAB (documentation), which splits the elements of a character vector into substrings:

# R:  
strsplit(x, split, fixed=FALSE)
% MATLAB
strsplit(x, split);

Splits a character string or vector of character strings using a regular expression or a literal (fixed) string. The strsplit function outputs a list (R) or cell array (MATLAB), where each list item corresponds to an element of x that has been split.

  • x a character string or vector of character strings to split.
  • split the character string to split x.
    In R, if split is an empty string (""), then x is split between every character.
  • [R only:] fixed if the split argument should be treated as fixed (i.e. literally). By default, the setting is FALSE, which means that split is treated like a regular expression.
702 questions
6
votes
1 answer

Use regular expressions in R strsplit

I would like to split "2015-05-13T20:41:29+0000" into 2015-05 and 20:41:29+0000. I tried the following: > strsplit("2015-05-13T20:41:29+0000",split="-\\d\\dT",fixed=TRUE) [[1]] [1] "2015-05-13T20:41:29+0000" but the pattern is not matched.…
Leonardo
  • 337
  • 2
  • 5
  • 12
6
votes
5 answers

R: How to shorten data frame values to first character

I would like to shorten the values of one column of my data.frame. Right now, each value consists of many letters, such as df$col1 [1] AHG ALK OPH BCZ LKH QRQ AAA VYY what I need is only the first letter: df$col1 [1] A A O …
PikkuKatja
  • 1,101
  • 3
  • 13
  • 21
6
votes
4 answers

Substituting the results of a calculation

I'm munging data, specifically, I've opened this pdf http://pubs.acs.org/doi/suppl/10.1021/ja105035r/suppl_file/ja105035r_si_001.pdf and scraped the data from table s4, 1a 1b 1a 1b 1 5.27 4.76 5.09 4.75 2 2.47 2.74 2.77 2.80 4 1.14 1.38 1.12…
DarrenRhodes
  • 1,431
  • 2
  • 15
  • 29
6
votes
7 answers

Extract string elements that possibly appear multiple times, or not at all

Start with a character vector of URLs. The goal is to end up with only the name of the company, meaning a column with only "test", "example" and "sample" in the example below. urls <- c("http://grand.test.com/", "https://example.com/", …
lawyeR
  • 7,488
  • 5
  • 33
  • 63
5
votes
2 answers

Split a string without considering special characters

I need a way to split a string every n letters. For example, let s="QW%ERT%ZU%I%O%P" and n=3, I want to obtain "QW%E" "RT%Z" "U%I%O" "%P". As you can see, the special character "%" is not considered in the division. I tried with strsplit(s,…
5
votes
1 answer

How to use `strsplit` before every capital letter of a camel case?

I want to use strsplit at a pattern before every capital letter and use a positive lookahead. However it also splits after every, and I'm confused about that. Is this regex incompatible with strsplit? Why is that so and what is to…
jay.sf
  • 60,139
  • 8
  • 53
  • 110
5
votes
5 answers

Split comma separated pattern from data frame in R

I have a dataset like that: Old <- data.frame( X1= c( "AD=17795,54;ARL=139;DEA=20;DER=20;DP=1785", "DP=4784;AD=4753,23;ARL=123;DEA=5;DER=5", "ARL=149;AD=30,9;DEA=25;DER=25;DP=3077", "AD=244,49;ARL=144;DEA=7;DER=7;DP=245" …
ersan
  • 393
  • 1
  • 9
5
votes
3 answers

How to count the factors in ordered sequence

I have a dataframe df: userID Score Task_Alpha Task_Beta Task_Charlie Task_Delta 3108 -8.00 Easy Easy Easy Easy 3207 3.00 Hard Easy Match Match 3350 5.78 Hard Easy Hard …
Sandy
  • 1,100
  • 10
  • 18
5
votes
5 answers

R how to create columns/features based on existing data

I have a dataframe df: userID Score Task_Alpha Task_Beta Task_Charlie Task_Delta 3108 -8.00 Easy Easy Easy Easy 3207 3.00 Hard Easy Match Match 3350 5.78 Hard Easy Hard …
Sandy
  • 1,100
  • 10
  • 18
5
votes
1 answer

Finding second space after each comma

This is a follow up to this question: Concatenate previous and latter words to a word that match a condition in R I am looking for a regex which splits the string at the second space that happens after comma. Look at the example below: vector <-…
M--
  • 25,431
  • 8
  • 61
  • 93
5
votes
2 answers

R strsplit using Regex

I want to use R to split some chat messages, here is an example: example <- "[29.01.18, 23:33] Alice: Ist das hier ein Chatverlauf?\n[29.01.18, 23:45] Bob: Ja ist es!\n[29.01.18, 23:45] Bob: Der ist dazu da die funktionsweise des Parsers zu…
Ju Ko
  • 466
  • 7
  • 22
5
votes
3 answers

R - difference between 2 sets in data frame

I have 2 factor columns, I want to create a third column which tells me what the second one has that the first does not. It's very similar to this post but I'm having trouble going from a df to using setdiff() function. For…
jmich738
  • 1,565
  • 3
  • 24
  • 41
5
votes
3 answers

Use strsplit with multiple delimiters

How can I split this Chr3:153922357-153944632(-) Chr11:70010183-70015411(-) in to Chr3 153922357 153944632 - Chr11 70010183 70015411 - I tried strsplit(df$V1,"[[:punct:]]")), but the negative sign is not coming in the final…
Kryo
  • 921
  • 9
  • 24
5
votes
2 answers

splitting and reordering character string by comma in r

I have several years worth of data on individuals, but their names are formatted differently each year. Half of the names are already in "First Last" order but I can't figure out how to successfully edit the other half ("Last, First"). Here's a…
jesstme
  • 604
  • 2
  • 10
  • 25
5
votes
2 answers

Split string in each column for several columns

I have this table (data1) with four columns SNP rs6576700 rs17054099 rs7730126 sample1 G-G T-T G-G I need to separate columns 2-4 into two columns each, so the new output have 7 columns. Like this : SNP rs6576700 rs6576700 rs17054099 rs17054099…
Sami
  • 53
  • 6