Questions tagged [splitstackshape]

Use the splitstackshape R package to stack and reshape datasets after splitting concatenated values

Online data collection tools like Google Forms often export multiple-response questions with data concatenated in cells. The concat.split (cSplit) family of functions splits such data into separate cells. The package also includes functions to stack groups of columns and to reshape wide data, even when the data are "unbalanced"---something which reshape (from base R) does not handle, and which melt and dcast from do not easily handle.

The package has as a dependency and some of its functions return data.tables.

CRAN Documentation

Main Website

60 questions
1
vote
2 answers

cSplit Coerces Unnecessary NA Row

I have a large data set a small sample of which looks like the 4 x 5 tibble below. I'm trying to split multiple delimited columns into unique rows using variable c=="Split" as below: library(splitstackshape) dt <- tibble( a = c("Quartz | White…
1984
  • 41
  • 3
1
vote
2 answers

cSplit does not work when a field has embedded the separator

I am using cSplit to split a column into three separate columns. The separator is " / " However, one of my fields has embedded the "/" separator. The third element of the third line was supposed to be and stay as "f/j" after the split. When I…
TCS
  • 127
  • 1
  • 11
1
vote
4 answers

Splitting columns in R

I am new to R and I have a big dataset with 17 columns and over a 1m rows. I want to split one of the columns into 4 by divider '/'. It's taking forever for R to complete the below commands. Is there a better way of completing the below. I have…
1
vote
3 answers

Creating a long table from a wide table using merged.stack (or reshape)

I have a data frame that looks like this: ID rd_test_2011 rd_score_2011 mt_test_2011 mt_score_2011 rd_test_2012 rd_score_2012 mt_test_2012 mt_score_2012 1 A 80 XX 100 NA NA BB …
n8sty
  • 1,418
  • 1
  • 14
  • 26
0
votes
0 answers

Recoding a multiple choice item in R using splitstackshape

I have a multiple choice question: Blah blah blah...Check all that apply. The output is: > x$Q34 [1] 1,2,8 6,7,8 1,5,6,8 4 [10] 2,5 2,6,7 …
Nick
  • 21
  • 6
0
votes
0 answers

j-argument when using cSplit function in R

I can not find a solution to this error and I am not sure what the cause of it is. I am trying to pull some stats Canada data sets. I have about 500 data points to update each month which can be identified by unique vector numbers. I have a way to…
0
votes
1 answer

Separating cells with several delimiters (splitstackshape)

I am working with a database that should be separated by several delimiters. The most common are semicolons and a point followed by a slash: './'. How do I complete the code in order to apply both…
onlyjust17
  • 125
  • 5
0
votes
2 answers

Splitting string column into a few columns - keep the null entries

Dear kind people of the internet, I need help. I am trying to split a string column into several columns and keep the null/NA entries. df <- cSplit(df, "question", "_") This code currently splits them but removes the null entries and shows the…
Thandi
  • 225
  • 1
  • 2
  • 9
0
votes
1 answer

Incorporating splitstackshape into loop

I have the following code that selects (4 rows of iris x 1000) *100 and calculates the bias of each column. library(SimDesign) library(data.table) do.call(rbind,lapply(1:100, function(x) { bias( setDT(copy(iris))[as.vector(sapply(1:1000,…
hugh_man
  • 399
  • 1
  • 6
0
votes
1 answer

Stratified Sampling in R- sample size issues

I am trying to do stratified sampling in R using the stratified function in the splitstackshape package. I have four strata (labeled 1:4). When setting the size = 1, it returns one row belonging to each strata (great!). However, I'm not able to…
hugh_man
  • 399
  • 1
  • 6
0
votes
2 answers

Transforming string column to specific data.frame

Desired Output Need the following output df2 <- data.frame( v1 = c(1100001, 1100002, 1100003, 1100004, 1100005) , v2 = c("A R", "W R", "A K", "M", "A C") , v3 = c("P", "G P", "G P", "P", "P") , v4 = c(110, 161, 129, 132, "Absent") , v5…
MYaseen208
  • 22,666
  • 37
  • 165
  • 309
0
votes
1 answer

cumsum by participant and reset on 0 R

I have a data frame that looks like this below. I need to sum the number of correct trials by participant, and reset the counter when it gets to a 0. Participant TrialNumber Correct 118 1 1 118 2 1 …
0
votes
1 answer

Creating new column based on condition

I have subsetted the data so it is easier to demonstrate what I am attempting to do. I am trying to create a data frame with a new row for the value in the column "MaxRounds". At first MaxRounds was in a column like so: …
0
votes
1 answer

Expand rows in data frame by length of sequence

I have a data frame like this mydf <- data.frame(x = c("a", "b", "q"), y = c("c", "d", "r"), min = c(2, 5, 3), max = c(4,6,7)) x y min max a c 2 4 b…
0
votes
1 answer

How to list row values in a column based on grouping value in R?

Hej, I have an input the file that has one column with gene id and then one with GO terms with multiple rows per gene (anywhere from 1 to >20). The format I need to generate has one single row for each unique gene id, with the GO terms in a second…