0

I have a some row names in Excel with their respective publications.

Name    Pub
AAAA    MNWP 
AAAA    TIWD 
BBBB    CGWK 
BBBB    MNWP 
CCCC    EXWE 
CCCC    EXWE 
CCCC    HXWT 
CCCC    MDWE 
CCCC    MDWE 
CCCC    MNWP 
CCCC    MNWP 
CCCC    NDWK 
DDDD    MNWP 
EEEE    EXWE 
EEEE    TIWE 

Essentially I want a list to be of this form:

AAAA: MNWP, TIWD
BBBB: CGWK, MNWP
CCCC: EXWE, HXWT, MDWE, MNWP, NDWK, TIWE
DDDD: MNWP
EEEE: EXWE, TIWE

Is there a way of merging this list in the format above? Either Excel or R (preferably Excel).

phg
  • 536
  • 1
  • 7
  • 19
  • Can you provide the _complete_ desired output for the sample data? E.g. what is the desired output for those CCCC rows? – talat Aug 14 '14 at 11:18
  • 1
    Why? You have a useful format and want to transform into something less useful for processing. – Roland Aug 14 '14 at 11:22
  • If you really want a `list` you could just use `split(DF$Pub, DF$Name)`. – Roland Aug 14 '14 at 11:28
  • @Roland, I doubt that they really want a `list` (as defined in R) since they prefer an Excel solution and as far as I know Excel doesn't support list structures as they exist in R. – talat Aug 14 '14 at 12:11
  • @beginneR It's not clear at all to me, what they need. Right now it seems like it would be best to write a print method. – Roland Aug 14 '14 at 12:22
  • @Roland, I agree that the question is not very clearly stated, especially what they're trying to do with the result (as you already stated in your first comment) – talat Aug 14 '14 at 12:25

3 Answers3

2

Using data.table

library(data.table)
setDT(dat)[, list(Group=paste(Pub, collapse=", ")), by=Name][,paste(Name, Group, sep=": ")]
#[1] "AAAA: MNWP, TIWD"                                    
#[2] "BBBB: CGWK, MNWP"                                    
#[3] "CCCC: EXWE, EXWE, HXWT, MDWE, MDWE, MNWP, MNWP, NDWK"
#[4] "DDDD: MNWP"                                          
#[5] "EEEE: EXWE, TIWE"             
akrun
  • 874,273
  • 37
  • 540
  • 662
1

One way would be

x <- read.table(text = "Name    Pub
AAAA    MNWP 
AAAA    TIWD 
BBBB    CGWK 
BBBB    MNWP 
CCCC    EXWE 
CCCC    EXWE 
CCCC    HXWT 
CCCC    MDWE 
CCCC    MDWE 
CCCC    MNWP 
CCCC    MNWP 
CCCC    NDWK 
DDDD    MNWP 
EEEE    EXWE 
EEEE    TIWE ", header = TRUE)

aggregate(Pub ~ Name, data = x, FUN = paste)

  Name                                            Pub
1 AAAA                                     MNWP, TIWD
2 BBBB                                     CGWK, MNWP
3 CCCC EXWE, EXWE, HXWT, MDWE, MDWE, MNWP, MNWP, NDWK
4 DDDD                                           MNWP
5 EEEE                                     EXWE, TIWE
Roman Luštrik
  • 69,533
  • 24
  • 154
  • 197
1

Another option in R:

library(dplyr)
df %>% 
  group_by(Name) %>%
  summarise(Pub = paste(Pub, collapse =", "))

#Source: local data frame [5 x 2]
#
#  Name                                            Pub
#1 AAAA                                     MNWP, TIWD
#2 BBBB                                     CGWK, MNWP
#3 CCCC EXWE, EXWE, HXWT, MDWE, MDWE, MNWP, MNWP, NDWK
#4 DDDD                                           MNWP
#5 EEEE                                     EXWE, TIWE
talat
  • 68,970
  • 21
  • 126
  • 157