1

I have a list of list, and each sublist also have multiple df. Now I would like to know the number of cols in each df in each sublist using imap. How can I get point to the df correctly.

Sample list can be built using:

lst1<-list(`101-01-101` = list(Demographics = structure(list(SubjectID = c("Subject ID", 
"101-01-101"), BRTHDTC = c("Birthday", "1953-07-07"), SEX = c("Gender", 
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID", 
"101-01-101"), DSDT = c("DS Date", "2016-03-14"), DSDT_P = c("DS Date Prob", 
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), `101-02-102` = list(Demographics = structure(list(SubjectID = c("Subject ID", 
"101-02-102"), BRTHDTC = c("Birthday", "1963-07-02"), SEX = c("Gender", 
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID", 
"101-02-102"), DSDT = c("DS Date", "2017-04-04"), DSDT_P = c("DS Date Prob", 
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), `101-03-103` = list(Demographics = structure(list(SubjectID = c("Subject ID", 
"101-03-103"), BRTHDTC = c("Birthday", "1940-09-11"), SEX = c("Gender", 
"Male")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID", 
"101-03-103"), DSDT = c("DS Date", NA), DSDT_P = c("DS Date Prob", 
"UN-UNK-2015")), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))), `101-04-104` = list(Demographics = structure(list(
    SubjectID = c("Subject ID", "101-04-104"), BRTHDTC = c("Birthday", 
    "1955-12-31"), SEX = c("Gender", "Male")), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame")), DiseaseStatus = structure(list(
    SubjectID = c("Subject ID", "101-04-104"), DSDT = c("DS Date", 
    "2016-05-02"), DSDT_P = c("DS Date Prob", NA)), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))), `104-05-201` = list(
    Demographics = structure(list(SubjectID = c("Subject ID", 
    "104-05-201"), BRTHDTC = c("Birthday", "1950-12-04"), SEX = c("Gender", 
    "Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
    "data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID", 
    "104-05-201"), DSDT = c("DS Date", "2018-07-06"), DSDT_P = c("DS Date Prob", 
    NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
    "data.frame"))))

I tried to use two imap to get to that level, but lost the direction at the end. Could someone help me on this and tell me how to correctly point the df in sublist.

my codes is sth like this:

   imap ( ~ { 
   wb = createWorkbook()
     imap(.x, ~ {     
       addWorksheet(wb, .y)
       writeData(wb, .y, .x)
       setColWidths(wb, .y, cols = 1:ncol(.x), widths = "auto")
      })

saveWorkbook(wb, file.path("C:/Users/",
                sprintf("subject_%s.xlsx", .y)))
                }
  )

Update:

if the df in sublist contain sth like this:

enter image description here

Stataq
  • 2,237
  • 6
  • 14

2 Answers2

3

You can map_depth to great advantage. You just have to assign a value to .depth argument so the function will be applied as many levels deep as you specified. In order to have a nicer looking output I just made 2 modifications:

library(purrr)

map_depth(lst1, 2, ~ length(.x)) %>%
  map(~ .x %>% bind_cols())

$`101-01-101`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

$`101-02-102`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

$`101-03-103`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

$`101-04-104`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

$`104-05-201`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

Or this one. However the output is not quite informative.

map_depth(lst1, 2, ~ length(.x)) %>%
  map(~ .x %>% bind_cols()) %>%
  exec(rbind, !!!.)

# A tibble: 5 x 2
  Demographics DiseaseStatus
*        <int>         <int>
1            3             3
2            3             3
3            3             3
4            3             3
5            3             3
Anoushiravan R
  • 21,622
  • 3
  • 18
  • 41
2

As of now I am unable to understand your code, but this should be re-written as. Not sure what you want where I have put a blank line

imap ( ~ { 
   wb = createWorkbook()
     imap(.x, function(a, b) {     
       addWorksheet(wb, b)
       writeData(wb, b, a)
       setColWidths(wb, b, cols = 1:ncol(a), widths = "auto")
      })

saveWorkbook(wb, file.path("C:/Users/",
                sprintf("subject_%s.xlsx", _________)))
                }
  )

Actually you have two problems there -

  • invisible function inside imap_* require two arguments.
  • Now your another problem is to write one lambda function inside another. That's an issue I have not solved till date.

Your earlier written expression can be correctly written as

imap(lst1, function(.x, .y) imap(.x, function(xy, yz) print(ncol(xy))))

[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
$`101-01-101`
$`101-01-101`$Demographics
[1] 3

$`101-01-101`$DiseaseStatus
[1] 3


$`101-02-102`
$`101-02-102`$Demographics
[1] 3

$`101-02-102`$DiseaseStatus
[1] 3


$`101-03-103`
$`101-03-103`$Demographics
[1] 3

$`101-03-103`$DiseaseStatus
[1] 3


$`101-04-104`
$`101-04-104`$Demographics
[1] 3

$`101-04-104`$DiseaseStatus
[1] 3


$`104-05-201`
$`104-05-201`$Demographics
[1] 3

$`104-05-201`$DiseaseStatus
[1] 3

Alternatively, if you want something else

imap_dfr(lst1, ~ .x %>% as.data.frame() %>% ncol())
# A tibble: 1 x 5
  `101-01-101` `101-02-102` `101-03-103` `101-04-104` `104-05-201`
         <int>        <int>        <int>        <int>        <int>
1            6            6            6            6            6

Or this?

map_df(lst1, ~map(.x, function(xy) ncol(xy)))
# map_df(lst1, ~map(.x, ncol)) ##alternative
# A tibble: 5 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3
2            3             3
3            3             3
4            3             3
5            3             3
AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • `ncol` that I want should be sth you got from 2nd part. How to put this part into my old codes. should I update it to `ncol(.x.y)`? – Stataq May 26 '21 at 14:10
  • see first part of my edited answer, @Stataq – AnilGoyal May 26 '21 at 14:13
  • I know` imap` sort of equal `Map`. If I rewrite it as `Map`, will that help? I am new to both them , so not sure how to move further. – Stataq May 26 '21 at 14:15
  • What is `xy`, `yz` stands for? – Stataq May 26 '21 at 14:18
  • 2
    These are arbitrary names of arguments, you may chose yourself. But remember, you'll use only first one from these (in the instant case). Actually `imap_*(list, ~{ .x * .y })` is equivalent to `imap_*(list, function(.x, .y) { .x * .y })` or `imap_*(list, function(a, b) { a * b })` – AnilGoyal May 26 '21 at 14:23
  • I think the problem is from my list. some of df in sublist is empty. Is it possible to run ` setColWidths(wb, b, cols = 1:ncol(a), widths = "auto")` only if a is not empty? – Stataq May 26 '21 at 14:57
  • try using unvectorised `if` – AnilGoyal May 26 '21 at 15:01
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232918/discussion-between-stataq-and-anilgoyal). – Stataq May 26 '21 at 15:12