0

Example data file (csv format)

testdf <- read.csv("example.csv")

I am trying to automate some roster-mining. At one point I need to split rows based on names with separators, so cSplit from splitstackshape is perfect. I am also preceding and following the split with a bunch of dplyr data shaping.

loaded libraries:

library(data.table)
library(splitstackshape)
library(tidyr)
library(dplyr)

The problem is that when I load dplyr after data.frame, I get the following message:

Attaching package: ‘dplyr’

The following objects are masked from ‘package:data.table’:

    between, last

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

Then when I try to use cSplit:

test <- cSplit(testdf, "Registrar", "/", direction = "long")

I get this error:

Error in `[.tbl_df`(indt, , splitCols, with = FALSE) : 
  unused argument (with = FALSE)

I have tried various permutations - this error only occurs when both data.frame and dplyr are loaded (in either order), and restarting R without dplyr or never loading it makes cSplit work properly.

I need to be able to use both at the same time though, and detaching dplyr doesn't help (just throws up missing dplyr errors).

I have seen this thread but they seem to have come to the conclusion the data is corrupted. This seems likely because if I run on a toy data set,

Name <- "Bo / Ashley"
Date <- "2015-02-04"

testdf2 <- data.frame(Name, Date)

testtoy <- cSplit(testdf2, "Name", "/", direction = "long")

it works fine. But I have no idea how to fix this "corruption".

Community
  • 1
  • 1
Luke_radio
  • 977
  • 1
  • 9
  • 15
  • 1
    I get a different error with the dataset `Error in type.convert(unlist(x, use.names = FALSE)) : the first argument must be of mode character` though I used `stringsAsFactors=FALSE` in the `read.csv` – akrun Aug 24 '15 at 02:46
  • 1
    For me, setting the `type.convert=FALSE` worked but it also gives the numeric factor index instead of the character vector. i.e. `test <- cSplit(testdf, "Registrar", "/", direction = "long", type.convert=FALSE)`. I think the error I got might be due to the `class` of the output column. If it is a `factor`, type.convert gives that error. – akrun Aug 24 '15 at 02:50
  • hmmm ... now I feel like my example isn't representative. If I set stringsAsFactors = FALSE on the csv import, and remove the first column (which isn't in my primary dataset) then the cSplit works. This is really confusing because the data I have provided is just my original data but having been shunted through write.csv and read.csv. – Luke_radio Aug 24 '15 at 03:04
  • It's most likely because you are trying to use the function on a `tbl_df`, for which I have not added support. – A5C1D2H2I1M1N2O1R2T1 Aug 24 '15 at 03:16
  • @Ananda Oh my. Is that all it is? I just tried converting back to a standard data.frame and it worked. Is this the way you would recommend working with dplyr and splitstackshape at the same time (ie converting everything back to a standard df before using SSS functions)? – Luke_radio Aug 24 '15 at 03:26
  • @LukeOR, I've added it as an issue [here](https://github.com/mrdwab/splitstackshape/issues/43). However, for now, the best bet is to add a `data.frame` into the processing chain (unless you are able to modify the function and create a pull request). Thanks. – A5C1D2H2I1M1N2O1R2T1 Aug 24 '15 at 03:33
  • @LukeOR, I've added another note in my answer. It might be more suitable to add a `tbl_dt` in your pipeline (and I can probably resolve the issues by adding the `tbl_dt` class to my `cSplit` functions. – A5C1D2H2I1M1N2O1R2T1 Aug 24 '15 at 03:44

1 Answers1

2

I haven't updated the functions in "splitstackshape" to work with tbl_df objects. As such, the current workaround would be to add a data.frame in your chain.

Compare:

library(splitstackshape)
library(dplyr)

CT <- tbl_df(head(concat.test))

CT %>% cSplit("Likes")
# Error in `[.tbl_df`(indt, , splitCols, with = FALSE) : 
#   unused argument (with = FALSE)

CT %>% data.frame %>% cSplit("Likes")
#      Name                   Siblings    Hates Likes_1 Likes_2 Likes_3 Likes_4 Likes_5
# 1:   Boyd Reynolds , Albert , Ortega     2;4;       1       2       4       5       6
# 2:  Rufus  Cohen , Bert , Montgomery 1;2;3;4;       1       2       4       5       6
# 3:   Dana                     Pierce       2;       1       2       4       5       6
# 4: Carole Colon , Michelle , Ballard     1;4;       1       2       4       5       6
# 5: Ramona           Snyder , Joann ,   1;2;3;       1       2       5       6      NA
# 6: Kelley          James , Roxanne ,     1;4;       1       2       5       6      NA

Alternatively, since with = FALSE is an argument for use in "data.table", you can use tbl_dt instead of tbl_df objects:

CT2 <- tbl_dt(head(concat.test))

CT2 %>% cSplit("Likes")
#      Name                   Siblings    Hates Likes_1 Likes_2 Likes_3 Likes_4 Likes_5
# 1:   Boyd Reynolds , Albert , Ortega     2;4;       1       2       4       5       6
# 2:  Rufus  Cohen , Bert , Montgomery 1;2;3;4;       1       2       4       5       6
# 3:   Dana                     Pierce       2;       1       2       4       5       6
# 4: Carole Colon , Michelle , Ballard     1;4;       1       2       4       5       6
# 5: Ramona           Snyder , Joann ,   1;2;3;       1       2       5       6      NA
# 6: Kelley          James , Roxanne ,     1;4;       1       2       5       6      NA

Of course, if someone create a pull request that solves the issue, I would be more than happy to make the relevant updates :-)

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485