Questions tagged [splitstackshape]

Use the splitstackshape R package to stack and reshape datasets after splitting concatenated values

Online data collection tools like Google Forms often export multiple-response questions with data concatenated in cells. The concat.split (cSplit) family of functions splits such data into separate cells. The package also includes functions to stack groups of columns and to reshape wide data, even when the data are "unbalanced"---something which reshape (from base R) does not handle, and which melt and dcast from do not easily handle.

The package has as a dependency and some of its functions return data.tables.

CRAN Documentation

Main Website

60 questions
2
votes
2 answers

Duplicating rows in dataframe based on column value

I am trying to duplicate rows based on the value of a column. My dataframe (df) currently looks like: Species name Visits Apis m 4 Bombus l 7 And so on (there are 34 more columns which all need to be repeated) I want it to look…
2
votes
2 answers

splitstackshape to split text based on different line separators \n for columns and observations

I have some text data which looks like: > myData keyColumn 1 \n\n\n\nCol1\n\nCol1…
user113156
  • 6,761
  • 5
  • 35
  • 81
2
votes
2 answers

R separate comma separated cells into rows and Cartesian product

I have mydf data frame below. I want to split any cell that contains comma separated data and put it into rows. I am looking for a data frame similar to y below. How could i do it efficiently in few steps? Currently i am using cSplit function on one…
user2543622
  • 5,760
  • 25
  • 91
  • 159
2
votes
1 answer

How to apply specific function to range of columns(but applying it to every column alone) in R?

how the data I work with looks(it is a SNP data): AA CC CA GG GA CA CC GG GG CCCC CAA GG CA GG CC GC How I want it to become after case 2(row 3 is removed due to multiple characters column 2 and all columns are split into 2) A A C C C A G…
2
votes
2 answers

cbind 1:nrows of same ID variable value to original data.frame

I have a large dataframe, where a variable id (first column) recurs with different values in the second column. My idea is to order the dataframe, to split it into a list and then lapply a function which cbinds the sequence 1:nrows(variable id) to…
user5958954
2
votes
1 answer

Can't remove columns from a dataframe, output turns into a logical vector

There seems to be something wrong with the data.frame I get from the cSplit function. I can't extract columns without NAs from using the code below: data_places <- data_table[ , colSums(is.na(data_table)) == 0 ] The output is a Named logi vector…
JnrfL
  • 189
  • 2
  • 8
2
votes
4 answers

Create several dummy variables from one string variable

I've tried pretty much everything from this similar question, but I can't get the results everyone else seems to be getting. This is my problem: I have a data frame like this, listing the grades each teacher works with: > profs <- data.frame(teaches…
Waldir Leoncio
  • 10,853
  • 19
  • 77
  • 107
1
vote
0 answers

how to do stratified sampling in loop?

I have two datasets. One that I need to sample from and the other that specified number of records to sample from each strata. I want to sample repeatedly with specified sample size until the sample dataframe reaches X number of records. How can I…
Learner
  • 83
  • 6
1
vote
1 answer

How to create new columns in a data.frame based on row values in R?

Hej, I have a data.frame with family trios, and I would like to add a column with the full sibs of every "id" (= offspring). My data: df id dam sire 1: 83295 67606 79199 2: 83297 67606 79199 3: 89826 67606 …
1
vote
1 answer

Need to split a column containing varying numbers of doubly concatenated data of variable names and observations

I have a column "sample_values" with varying numbers of doubly concatenated data delimited with both "," and ":" characters. I need to make the values separated by "," into new variables (columns) and the values separated by ":" the observations of…
caparks
  • 47
  • 4
1
vote
1 answer

split extra-delimited column from prokka gff table with varying number of entries into new columns with NAs (splitstackshape / R)

I have a file including tab separated and semicolon separated data (a prokka annotation file in .gff format). Unfortunately, the semicolon separated part is not consistent in the number of entries. Fortunately, though, the leading part after the…
crazysantaclaus
  • 613
  • 5
  • 19
1
vote
0 answers

cSplit function in R

jnk <- data.table(a=c('1,2,3,4,5','1,2,3','2,3'),b=c('5,2,3','4,2',3)) Can I do a split using both the columns a and b? If we use [Undesired_Output][1] H= cSplit(jnk,"a",",","long"), we get the output as shown in image 1. However I want the output…
Udit
  • 21
  • 1
1
vote
1 answer

Bug in Reshape (splitstackshape)?

I'm fairly sure this is a bug, but I just wanted to put it to the community first. In the example page for the Reshape function of the splitstackshape package: set.seed(1) mydf <- data.frame(id_1 = 1:6, id_2 = c("A", "B"), varA.1 = sample(letters,…
Edward
  • 48
  • 5
1
vote
1 answer

cSplit_e from splitstackshape package not accounting for NA's?

I wanted to follow up on the question that I posted here. While I received baseR and data.table solution, I was trying to implement the same using cSplit_e from splitstackshape package as suggested in the comment of my previous post. With the…
Metrics
  • 15,172
  • 7
  • 54
  • 83
1
vote
1 answer

Displaying data in the chart based on plotly_click in R shiny

Please run this script below, the following R script gives a shiny dashboard with two boxes. I want to reduce the width between two boxes and display data in the right chart. The data should be based on the on click event that we see in the ggplotly…
Ashmin Kaul
  • 860
  • 2
  • 12
  • 37