-2

I have the following table of group IDs (GroupNum) with a list of values that they are associated with (NPI_list):

df1 <- data.frame(GroupNum=c(41,224,1032,2754,3907,4107),
              NPI_list=c('1740411552,1932387479','1710112156,1841438280',
                         '1629405113,1942433891','1629405113,1992083588',
                         '1710112156,1841438280','1740411552,1932387479'),
              stringsAsFactors = F)

There are instances where there are common elements between list. I need to consolidate groups with common values within each of their respective GroupNum IDs such that I get an end product similar to the following

df2 <- data.frame(GroupNum=c('41,4107','224,3907','1032,2754'),
              NPI_list=c('1740411552,1932387479','1710112156,1841438280','1629405113,1992083588,1942433891'),
              stringsAsFactors = F)

I have been told that there is a way to determine if there are common elements between list via python but I only have experience with R. I have tried a similar dplyr solution to that of Duck below but it still groups NPI_list and I need to be able to compare the individual elements within each list to that of all other list and combine the lists if there is a single match.

Any advice would be helpful. I am suspecting that I will need to use some sort of for loop.

  • Could your please post your data and any code you've tried? A picture of your input data does not help anyone reproduce your situation. A common way to post data from R is to use `dput()` and then paste into your question the file contents. Read more about creating a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example) – Ben Norris Aug 08 '20 at 22:53
  • How does the last row match, they have nothing in common. Do you need `aggregate(GroupNum~ NPI_list, df, toString)` ? – Ronak Shah Aug 09 '20 at 01:15
  • Apologies Ben and Ronak. This was my first time posting to Stack. I have made the appropriate edits by posting the original dataframes. Please let me know if I can clarify anything. – Derick Rapista Aug 09 '20 at 15:10

1 Answers1

0

I think this can help you. You must have data you showed in a dataframe. And next time please include your data or a portion of it by using dput(yourdata) or dput(head(yourdata,20)) over your dataframe and paste the result in the question. It is more easy to help you with that. Next a possible solution with similar data and using dplyr:

library(dplyr)
#Data
df1 <- data.frame(GroupNum=c(41,224,1032,2754,3907,4107),
                  NPI_list=c('1740411552,1932387479','1710112156,1841438280',
                             '1639127913,1942433891','1629405113,1992083588',
                             '1710112156,1841438280','1740411552,1932387479'),
                  stringsAsFactors = F)
#Aggregate
df2 <- df1 %>% group_by(NPI_list) %>% 
  mutate(N=n(),id=cur_group_id(),id=ifelse(N==1,0,id)) %>%
  ungroup() %>% group_by(id) %>%
  summarise(GroupNum=paste0(GroupNum,collapse = ','),
            NPI_list=paste0(unique(NPI_list),collapse = ',')) %>% ungroup() %>% select(-id)

The output will be:

# A tibble: 3 x 2
  GroupNum  NPI_list                                   
  <chr>     <chr>                                      
1 1032,2754 1639127913,1942433891,1629405113,1992083588
2 224,3907  1710112156,1841438280                      
3 41,4107   1740411552,1932387479  
Duck
  • 39,058
  • 13
  • 42
  • 84
  • Thank you for your prompt response Duck. Apologies for the incorrect formatting of my question. This is my first time posting a question to Stack. Just to clarify, the first row observations are the collections of GroupNum and corresponding NPI values that did not have a common element between other groups? Thanks again. – Derick Rapista Aug 09 '20 at 03:39