1

I am new to R and I have been trying to pivot a data frame I read from a CSV file. The original CSV contain 5,000 item numbers, in my sample I used the first five. My end result using pivot should present each item number as many times as the payments done and the payments type. For example, the original table look like this:

ITEM NUMBER P1      P2      P3      P4      PType1  PType2      PType3  PType4
697884      270     255     170     0       CASH    CA      VI  
697885      100     1160    310     580     CASH    AX      VI          CA
697886      1515    1455    1765    970     CASH    AX      VI          CA
697887      0       0       0       0               
697888      1755    3610    1950    0       AX          VI      CA

By using pivot I want to get a table like this:

ITEM NUMBER Payment    PaymentType  
697884           270         CASH
697884           255         CA
697884           170         VI

...(The next item)

My current Data Frame contain 9 variables where the item number is a NUM, the payment amount is int and the payment type is Factor. Thank you!

structure(list(ITEM.NUMBER = 697884:697888, Payment1 = c(270L, 
100L, 1515L, 0L, 1755L), Payment2 = c(255L, 1160L, 1455L, 0L, 
3610L), Payment3 = c(170L, 310L, 1765L, 0L, 1950L), Payment4 = c(0L, 
580L, 970L, 0L, 0L), PaymentType1 = structure(c(3L, 3L, 3L, 1L, 
2L), .Label = c("", "AX", "CASH"), class = "factor"), PaymentType2 = structure(c(3L, 
2L, 2L, 1L, 4L), .Label = c("", "AX", "CA", "VI"), class = "factor"), 
    PaymentType3 = structure(c(3L, 3L, 3L, 1L, 2L), .Label = c("", 
    "CA", "VI"), class = "factor"), PaymentType4 = structure(c(1L, 
    2L, 2L, 1L, 1L), .Label = c("", "CA"), class = "factor")), .Names = c("ITEM.NUMBER", 
"Payment1", "Payment2", "Payment3", "Payment4", "PaymentType1", 
"PaymentType2", "PaymentType3", "PaymentType4"), row.names = c(NA, 
-5L), class = "data.frame")
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
EP_NYC
  • 13
  • 5
  • 1
    `melt` in the `plyr` package is very efective for this. – Brandon Bertelsen Feb 02 '14 at 02:16
  • Brandon, can you please elaborate how would you suggest using the formula with the variables I have and how to make the connection between the payment amount and the payment type – EP_NYC Feb 02 '14 at 03:18
  • `melt(dat, id = c('ITEM', 'NUMBER'), variable.name = 'PaymentType', value.name = 'Payment')` – Ramnath Feb 02 '14 at 03:26
  • Thank you @Ramnath, the formula you gave me produce everything into a list and not into a table. It create a list for the ticket number and payment amount and underneath another list of the ticket number and the payment type. Is it because of the melt function? If so, how do I bring this list into a table? – EP_NYC Feb 02 '14 at 03:39
  • Use `melt(dat, id = 'ITEMNUMBER')` after removing the space from the column name. I checked and it does give me a data frame. – Ramnath Feb 02 '14 at 04:01
  • You could probably just get away with `melt(dat, c())` on your data for the desired effect. But I can't test it since I can't load your data in the way you've put it up. Could you try something like `dput(head(dat,25))` and post the output in the place of your original "table looks like" – Brandon Bertelsen Feb 02 '14 at 04:27
  • @ramnath My results still end up as a list. I think I am missing a crucial part in writing the function.I need it to show like the table I attached in the body of the question (table 2) – EP_NYC Feb 02 '14 at 04:40
  • I agree with Brandon that you should paste the output of `dput(head(dat, 25))` so that we are working with the same data structure that you are. – Ramnath Feb 02 '14 at 04:58
  • @ramnath I am new to R and to this forum so I hope I paste the output correctly. Thanks again for all the help. – EP_NYC Feb 02 '14 at 05:13
  • Now that you have pasted your data, just do `melt(dat, id = 'ITEM.NUMBER')` and you will have what you seek. – Ramnath Feb 02 '14 at 05:20
  • @Ramnath, not exactly. They don't want the data in such a long format. They will need to split the resulting `variable` column after melting, and then they will need to use `dcast` to get back to a semi-wide form. – A5C1D2H2I1M1N2O1R2T1 Feb 02 '14 at 05:22
  • Ah okay. It wasn't clear, but I see from the output now. Your `reshape` based solution is the most appropriate in this case. – Ramnath Feb 02 '14 at 05:26
  • @Ramnath, I added a "reshape2" answer just for fun :-) – A5C1D2H2I1M1N2O1R2T1 Feb 02 '14 at 05:48

1 Answers1

0

You can use reshape from base R. Assuming your data are called "mydf":

reshape(mydf, direction = "long", idvar="ITEM.NUMBER", 
        varying=2:ncol(mydf), sep = "")
#          ITEM.NUMBER time Payment PaymentType
# 697884.1      697884    1     270        CASH
# 697885.1      697885    1     100        CASH
# 697886.1      697886    1    1515        CASH
# 697887.1      697887    1       0            
# 697888.1      697888    1    1755          AX
# 697884.2      697884    2     255          CA
# 697885.2      697885    2    1160          AX
# 697886.2      697886    2    1455          AX
# 697887.2      697887    2       0            
# 697888.2      697888    2    3610          VI
# 697884.3      697884    3     170          VI
# 697885.3      697885    3     310          VI
# 697886.3      697886    3    1765          VI
# 697887.3      697887    3       0            
# 697888.3      697888    3    1950          CA
# 697884.4      697884    4       0            
# 697885.4      697885    4     580          CA
# 697886.4      697886    4     970          CA
# 697887.4      697887    4       0            
# 697888.4      697888    4       0

If you want to order by the "ITEM.NUMBER", you can use order:

out <- reshape(mydf, direction = "long", idvar="ITEM.NUMBER", 
               varying=2:ncol(mydf), sep = "")
out[order(out$ITEM.NUMBER), ]

Update

For completeness, here's the reshape2 approach that I came up with:

First, melt the data (as indicated in the comments):

mydfL <- melt(mydf, id.vars="ITEM.NUMBER")
head(mydfL)
#   ITEM.NUMBER variable value
# 1      697884 Payment1   270
# 2      697885 Payment1   100
# 3      697886 Payment1  1515
# 4      697887 Payment1     0
# 5      697888 Payment1  1755
# 6      697884 Payment2   255

Second, split up the "variable" column. There might be better ways to do this, but this is what came to my mind.

mydfL <- cbind(mydfL, do.call(rbind, strsplit(
  as.character(mydfL$variable), split = "(?<=[a-zA-Z])(?=[0-9])", perl = T)))
head(mydfL)
#   ITEM.NUMBER variable value       1 2
# 1      697884 Payment1   270 Payment 1
# 2      697885 Payment1   100 Payment 1
# 3      697886 Payment1  1515 Payment 1
# 4      697887 Payment1     0 Payment 1
# 5      697888 Payment1  1755 Payment 1
# 6      697884 Payment2   255 Payment 2

Third, use dcast to get your output. Since some columns are named "1" and "2", you'll need to use backticks (`) to quote them and get R to recognize them as column names and not values.

dcast(mydfL, ITEM.NUMBER + `2` ~ `1`, value.var="value")
#    ITEM.NUMBER 2 Payment PaymentType
# 1       697884 1     270        CASH
# 2       697884 2     255          CA
# 3       697884 3     170          VI
# 4       697884 4       0            
# 5       697885 1     100        CASH
# 6       697885 2    1160          AX
# 7       697885 3     310          VI
# 8       697885 4     580          CA
# 9       697886 1    1515        CASH
# 10      697886 2    1455          AX
# 11      697886 3    1765          VI
# 12      697886 4     970          CA
# 13      697887 1       0            
# 14      697887 2       0            
# 15      697887 3       0            
# 16      697887 4       0            
# 17      697888 1    1755          AX
# 18      697888 2    3610          VI
# 19      697888 3    1950          CA
# 20      697888 4       0            
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • @EP_NYC, the "2" column is acting like a secondary "ID" column to make a unique ID for each row. To get rid of the "0" values look into `[` (see `?Extract`) and use that with a comparison like `==`. – A5C1D2H2I1M1N2O1R2T1 Feb 03 '14 at 01:25