0

I have a data frame with two lists of variables. Each observation in the list contains different length of elements. For example the 4th of the variable “accession” contains one element but 7th contains two elements. current dataframe

I want to make a new data frame combine two lists together which looks like: final dataframe I want

Thanks for helping me!

This is data frame I am currently having.

library(rentrez)


search <- entrez_search(db="gds", term=paste0("disease", " AND gse[ETYP]") , retMax = 15) 
id <- unlist(search$ids)
UID <- c(sapply(id, paste0, collapse=""))
pub.summary <- entrez_summary(db = "gds", id = UID ,  
                              always_return_list = TRUE)
summary <- extract_from_esummary(esummaries = pub.summary , 
                                           elements = c("samples"),
                                           simplify = T)
df <- data.frame(summary)
df <-data.frame(t(df))
df <- df %>% mutate()
df

This is the data frame result I wish to have

#  accession                                  title
#1 GSM3955152                                Cancer3
   GSM3955155                              Adjacent3
   GSM3955757 SW480 cells, HES1-binding RNAs/LncRNAs
   GSM3955153                              Adjacent1
   GSM3955150                                Cancer1
   GSM3955151                                Cancer2
#2 GSM33026213                      his4wk_sensitized_uti_1
   GSM3302681                         3his4wk_resolved_pbs_2
   GSM3302624                           c57bl6j_pbs_9
.
.
.
.
#4 GSM3955757                      SW480 cells, HES1-binding RNAs/LncRNAs
.
.
.
.
#15 GSM3934992                    control rep4 [N_0039]
    GSM3935006                    control rep15 [W_010]
    GSM3935012                    control rep17 [W_023]
    GSM3934989                    control rep1 [N_0026]
END
 
    

1 Answers1

1

Update

Based on the OP's updates, an option is to specify simplify = FALSE in the extract_from_esummary to return as list, then extract the first list element fom each list and rbind to create a single dataframe

summary <- extract_from_esummary(esummaries = pub.summary , 
                                           elements = "samples",
                                           simplify = FALSE)


out <- do.call(rbind, lapply(summary, `[[`, 1))
row.names(out) <- NULL
head(out)
#  accession                                  title
#1 GSM3955152                                Cancer3
#2 GSM3955155                              Adjacent3
#3 GSM3955757 SW480 cells, HES1-binding RNAs/LncRNAs
#4 GSM3955153                              Adjacent1
#5 GSM3955150                                Cancer1
#6 GSM3955151                                Cancer2

An option would be pad the list elements with NA to keep the length same in both columns (if one is of different length) and then unnest

library(dplyr)
library(purrr)
df1 %>%
   mutate(n = pmax(lengths(accession), lengths(title))) %>% 
   mutate_at(vars(accession, title), ~ 
         map2(., n, ~ `length<-`(.x, .y))) %>% 
   select(-n) %>%
   unnest(cols = c(accession, title))
# A tibble: 12 x 2
#   accession title
#   <chr>     <chr>
# 1 A         a    
# 2 B         b    
# 3 C         c    
# 4 <NA>      d    
# 5 <NA>      e    
# 6 A         a    
# 7 B         b    
# 8 C         c    
# 9 D         <NA> 
#10 E         <NA> 
#11 A         d    
#12 B         <NA> 

Or an option is to gather into 'long' format, then unnest the 'val' column and spread it back to 'wide' format

library(tidyr)
df1 %>%
    mutate(rn = row_number()) %>% 
    gather(key, val, -rn) %>%
    unnest(val) %>%
    group_by(rn, key) %>% 
    mutate(i1 = row_number()) %>% 
    spread(key, val) %>% 
    ungroup %>% 
    select(-rn, -i1)

data

df1 <- tibble(accession = list(LETTERS[1:3], LETTERS[1:5], LETTERS[1:2]), 
       title = list(letters[1:5], letters[1:3], letters[4]))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks for helping me. Based on you example. I want the result looks like# A tibble: 12 x 1 # accession title # # 1 A,B,C a,b,c,d,e # 2 A,B,D,E a,b,c # 3 A,B d. Which only contain 3 observations. For example #1 contains "A,B,C" for accession and "a,b,c,d,e" for title. – R beginnnnnner Jul 22 '19 at 20:15
  • @Rbeginnnnnner you said your data is `list` column. I created some example by spending my time to help you. You didn't even provide a reproduciblee example – akrun Jul 22 '19 at 20:15
  • Your example is pretty clear and this the problem I am facing now. But the result I hope to see is to make it into one data frame and with only 3 observations not 12. Is there any way can do that? I am sorry for the late reply. – R beginnnnnner Jul 22 '19 at 20:23
  • @Rbeginnnnnner Why dont u update your post with a small rperoducible example with `dput` and expected output? – akrun Jul 22 '19 at 20:28
  • I even dont know how to do that. I add the image for the expected output when posted the question. – R beginnnnnner Jul 22 '19 at 20:31
  • @Rbeginnnnnner I showed a way to create an example. Second is if you have already read the data, then `dput(head(yourdata, 5))` – akrun Jul 22 '19 at 20:35
  • I just updated the question with my current data frame that I have. – R beginnnnnner Jul 22 '19 at 20:42
  • @Rbeginnnnnner Can you check the updateed solution – akrun Jul 22 '19 at 20:58
  • I updated the expected result based on your solution. Sorry for the confusion. Is there any way can make a data frame like that? – R beginnnnnner Jul 22 '19 at 21:11
  • @Rbeginnnnnner I already updated and it gives same as your expected – akrun Jul 22 '19 at 21:12
  • It is not the same. Your result has 2184 observations. But I only want 15 observations in total. – R beginnnnnner Jul 22 '19 at 21:15
  • All the result for head(out) should all belong to #1 instead of each one has an unique number. – R beginnnnnner Jul 22 '19 at 21:17
  • Sorry for the unclear output. I hope I can write the code and show the actual output as well haha. – R beginnnnnner Jul 22 '19 at 21:20
  • not clear about 15 obs even ur expected shows more – akrun Jul 22 '19 at 21:22
  • So, is there any way can get the output like the result that I posted? – R beginnnnnner Jul 22 '19 at 21:24
  • @Rbeginnnnnner Sorry, I didn't undertsand your output – akrun Jul 22 '19 at 21:26
  • The “df” I already made contained 15 rows and 2 columns each column has 15 list but each list has different length of elements. So my output is to combine the two columns into one but still have 15 rows and make the list to the actual string. – R beginnnnnner Jul 22 '19 at 21:33
  • 1
    @Rbeginnnnnner Your final data have more row, and it is not clear how you want to combine/paste. But, I think you also said that you want a data.frame. I am confused. Sorry, I think I spent too much time on this – akrun Jul 22 '19 at 21:36
  • Thanks for ur time. I will try to figure out. – R beginnnnnner Jul 22 '19 at 21:48