reshape data.table using by

Question

I have a data.table where the data I want is structured in a diagonal fashion.

library(data.table)
month <- c(201406, 201406, 201406, 201406, 201406, 201406, 201406, 201406, 
201406, 201406, 201406, 201406)
code <- c("498A01", "498A01", "498A01", "498A01", "498A01", "498A01", "498A01", "498A01", 
"498A01", "498A01", "498A01", "498A01")
col.a <- c("service", "base charge", "", "", "", "", "", "", "", "", "", "")
col.b <- c("", "", "description", "per unit", "", "", "", "", "", "", "", "")
col.c <- c("", "", "", "", "rate", 6859, "", "", "", "", "", "")
col.d <- c("", "", "", "", "", "", "quantity", 1, "", "", "", "")
col.e <- c("", "", "", "", "", "", "", "", "total charge", 6859, "", "")
col.f <- c("", "", "", "", "", "", "", "", "", "", "", "")   
dt <- data.table(month, code, col.a, col.b, col.c, col.d, col.e, col.f)

However, I need to organize the data in a more coherent fashion to simplify dt I am fairly new to data.table and I was wondering if there was a straightforward way to do so.

For col.a I know the following works for one column:

dt <- dt[col.a != "", 1:8, by = .(code, month)

But when I try for multiple columns it returns a data table with 0 obs. I suppose I could do that for all of the columns and then do some kind of merge but that seems inefficient and cumbersome. Is there a better way?

My desired output is:

   month   code      col.a      col.b     col.c   col.d       col.e    col.f
1: 201406 498A01     service description   rate quantity total charge       
2: 201406 498A01 base charge    per unit   6859        1         6859

So for each unique combination of code and month I want to remove the empty cells and collapse the data to look like it does above. I need to keep the col.f1 because it may not always be blank.

Any suggestions would be greatly appreciated.

I'm not sure I completely understand your data structure, using diagonals like this. Also, your data tables don't do anything with `month` or `code`, so I'm not sure what the entries are supposed to look like. — Anonymous coward, Aug 02 '18 at 15:41
@Anonymouscoward that's how the data is in the data frame, this is just an example. It was all in one column before, which is even worse. it's a messy problem for sure, hence the post. I went ahead and added `month` and `code` to the data tables. — jvalenti, Aug 02 '18 at 15:43
Ok, I think I see now, just correct me if I'm wrong. Is it that you have data where 12 rows are essentially one entry? And the next entry is the same, but for month `201407`? — Anonymous coward, Aug 02 '18 at 15:49
@Anonymouscoward the next entry would be the same as long as the value of `code` were the same. `month` would be different, depending on how the data was sorted, `201407` or `201405`. I would imagine there is something that could be done using the `by = list(code, month)` but I am not sure how to implement that — jvalenti, Aug 02 '18 at 15:55

chinsoon12 · Answer 1 · 2018-08-04T23:18:39.187

3

Are you looking for something like

dt[, lapply(.SD, function(x) x[x!=""][1:2]), by=.(month, code)]

output:

    month   code       col.a       col.b col.c    col.d        col.e col.f
1: 201406 498A01     service description  rate quantity total charge  <NA>
2: 201406 498A01 base charge    per unit  6859        1         6859  <NA>

Or in base R:

do.call(rbind, by(dt, paste(dt$month, dt$code), 
    function(y) do.call(cbind, lapply(y, function(x) x[x!=""][1:2]))))

output:

     month    code     col.a         col.b         col.c  col.d      col.e          col.f
[1,] "201406" "498A01" "service"     "description" "rate" "quantity" "total charge" NA   
[2,] "201406" "498A01" "base charge" "per unit"    "6859" "1"        "6859"         NA

edited Aug 04 '18 at 23:18

answered Aug 03 '18 at 00:36

chinsoon12

25,005
4
25
35

yes, this is the right idea. What if I have a large, unspecified number of such combinations of `month` and `code`? – jvalenti Aug 03 '18 at 13:25
1

You can use by in the data.table syntax. Not in front of a computer now to fix the base R syntax. Will do it when I have access to R – chinsoon12 Aug 04 '18 at 23:19
What if you don't know the number of rows per `month`, `code` combination? – jvalenti Aug 06 '18 at 14:34
Just remove the [1:2] – chinsoon12 Aug 06 '18 at 22:42

reshape data.table using by

1 Answers1