Questions tagged [splitstackshape]

Use the splitstackshape R package to stack and reshape datasets after splitting concatenated values

Online data collection tools like Google Forms often export multiple-response questions with data concatenated in cells. The concat.split (cSplit) family of functions splits such data into separate cells. The package also includes functions to stack groups of columns and to reshape wide data, even when the data are "unbalanced"---something which reshape (from base R) does not handle, and which melt and dcast from do not easily handle.

The package has as a dependency and some of its functions return data.tables.

CRAN Documentation

Main Website

60 questions
0
votes
1 answer

cSplit_e not returning a binary data frame

I have a data frame with a Genre column that has rows like Action,Romance. I want to split those values and create a binary vector. If Action,Romance,Drama are all the possible genres, then the above mentioned row would be 1,1,0 in the output data…
James L.
  • 12,893
  • 4
  • 49
  • 60
0
votes
2 answers

stratified sampling with fixed proportions of observation types in R

I have a sample where 50% of the observations are White and 50% African-American. I would like to obtain a random subsample where such proportion is modified to 80% White and 20% African-American. I have tried the command stratified but I could not…
Ric
  • 1
  • 1
0
votes
1 answer

Removing unnecessary labels in the tooptip in R and ggplot2 chart

Upon running the R and ggplot2 script below, the following snapshot is generated. Upon hovering on any box, we get the following tooltip as shown in the plot. My simple requirement is to get rid of the fourth tooltip attribute as it is similar to…
Ashmin Kaul
  • 860
  • 2
  • 12
  • 37
0
votes
2 answers

Stratified sampling with constraints

I'm a newbie in R so just bear with me. So I'm trying to perform stratified sampling in such a way that, it will use a 2 column strata but with both columns satisfying specific values. This is my code: library(splitstackshape) set.seed(1) dat1 <-…
Marek
  • 245
  • 1
  • 4
  • 15
0
votes
1 answer

R split column in BigCartel csv file into long format in dataframe or data.table

Big Cartel has an option that exports orders into a csv file. However the structure is not very good for the analysis I need to do. Here is a subset of the columns and rows from a Big cartel csv order download (there are other columns which are not…
Martyn
  • 55
  • 6
0
votes
0 answers

Split Using cSplit R

In my dataset I have a column whit hash-tags and names separated by comma (i.e., #xxx, @rrr, ...). I want to split it in one column for each has-tag/name. I used: library(splitstackshape) hash <- cSplit(indt = as.data.table(dados), splitCols =…
0
votes
1 answer

R create data table columns dynamically

I have this data table called tmp.df.lhs.denorm which I provided the first 2 rows ahead: > dput(tmp.df.lhs.denorm[1:2]) structure(list(rules = c("{} => {Dental anesthetic products-Injectables cartridges|2288210-Septocaine Cart 4% w/EPI}",…
NRG
  • 149
  • 2
  • 10
0
votes
1 answer

Total rows does not contain a factor and the value is not zero

I have the following data path value 1 b,b,a,c 3 2 c,b 2 3 a 10 4 b,c,a,b 0 5 e,f 0 6 a,f 1 df df <- data.frame (path= c("b,b,a,c", "c,b", "a", "b,c,a,b" ,"e,f" ,"a,f"), value = c(3,2,10,0,0,1)) I…
MFR
  • 2,049
  • 3
  • 29
  • 53
0
votes
1 answer

splitstackshape pkg - concat.split.expanded returning NA by coercion errors

I'm following the instructions here Dummy variables from a string variable to try to convert a column of strings (words separated by spaces) into dummy variables (0-1 to indicate a word being notused/used in the string in that row) using…
D. K.
  • 73
  • 7
0
votes
0 answers

Split string fixed width

Sorry for this surely basic question, but I really couldn't find a clear answer: I have a data frame I'm trying to split by a fixed number of characters I'd previously been using: data = cSplit(data, 'variable', sep="what to separate on",…
Jim
  • 715
  • 2
  • 13
  • 26
0
votes
2 answers

One Hot Encoding of complex variables

I have a dataset where all my data is categorical and I would like to use one hot encoding for further analysis. Main issues I would like to resolve: Some cells contain many text in one cell (an example will follow). Some numerical values need to…
Boro Dega
  • 393
  • 1
  • 3
  • 13
0
votes
1 answer

Error running cSplit when splitstackshape/data.frame and tidyr/dplyr are loaded

Example data file (csv format) testdf <- read.csv("example.csv") I am trying to automate some roster-mining. At one point I need to split rows based on names with separators, so cSplit from splitstackshape is perfect. I am also preceding and…
Luke_radio
  • 977
  • 1
  • 9
  • 15
-1
votes
1 answer

How to set seed for random sample selection?

Based on a data frame with grouped samples I'd like to pick 5 samples randomly from each group. I can do so easily using the function stratified from package splitstackshape. But is it possible to set a seed as to make the selection…
erc
  • 10,113
  • 11
  • 57
  • 88
-1
votes
1 answer

how to split each columns in a row to separate column in R

I would like to split each column in the 4 the row from the input data to separate column one below the other as shown in the expert output input cytoband 11qE2 1qC1.1 13qD2.1 q value 1.16 1.53 1.13 wide…
beginner
  • 411
  • 1
  • 5
  • 13
-1
votes
2 answers

Splitting numerals from string in data frame

I have a data frame in R with a column that looks like this: Venue AAA 2001 BBB 2016 CCC 1996 ... .... ZZZ 2007 In order to make working with the dataframe slightly easier I wanted to split up the venue column into two columns, location and year,…
S.Fischer
  • 19
  • 2
1 2 3
4