how to define a df in a sublist using imap

Question

I have a list of list, and each sublist also have multiple df. Now I would like to know the number of cols in each df in each sublist using imap. How can I get point to the df correctly.

Sample list can be built using:

lst1<-list(`101-01-101` = list(Demographics = structure(list(SubjectID = c("Subject ID", 
"101-01-101"), BRTHDTC = c("Birthday", "1953-07-07"), SEX = c("Gender", 
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID", 
"101-01-101"), DSDT = c("DS Date", "2016-03-14"), DSDT_P = c("DS Date Prob", 
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), `101-02-102` = list(Demographics = structure(list(SubjectID = c("Subject ID", 
"101-02-102"), BRTHDTC = c("Birthday", "1963-07-02"), SEX = c("Gender", 
"Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID", 
"101-02-102"), DSDT = c("DS Date", "2017-04-04"), DSDT_P = c("DS Date Prob", 
NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", "data.frame"
))), `101-03-103` = list(Demographics = structure(list(SubjectID = c("Subject ID", 
"101-03-103"), BRTHDTC = c("Birthday", "1940-09-11"), SEX = c("Gender", 
"Male")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
"data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID", 
"101-03-103"), DSDT = c("DS Date", NA), DSDT_P = c("DS Date Prob", 
"UN-UNK-2015")), row.names = c(NA, -2L), class = c("tbl_df", 
"tbl", "data.frame"))), `101-04-104` = list(Demographics = structure(list(
    SubjectID = c("Subject ID", "101-04-104"), BRTHDTC = c("Birthday", 
    "1955-12-31"), SEX = c("Gender", "Male")), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame")), DiseaseStatus = structure(list(
    SubjectID = c("Subject ID", "101-04-104"), DSDT = c("DS Date", 
    "2016-05-02"), DSDT_P = c("DS Date Prob", NA)), row.names = c(NA, 
-2L), class = c("tbl_df", "tbl", "data.frame"))), `104-05-201` = list(
    Demographics = structure(list(SubjectID = c("Subject ID", 
    "104-05-201"), BRTHDTC = c("Birthday", "1950-12-04"), SEX = c("Gender", 
    "Female")), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
    "data.frame")), DiseaseStatus = structure(list(SubjectID = c("Subject ID", 
    "104-05-201"), DSDT = c("DS Date", "2018-07-06"), DSDT_P = c("DS Date Prob", 
    NA)), row.names = c(NA, -2L), class = c("tbl_df", "tbl", 
    "data.frame"))))

I tried to use two imap to get to that level, but lost the direction at the end. Could someone help me on this and tell me how to correctly point the df in sublist.

my codes is sth like this:

   imap ( ~ { 
   wb = createWorkbook()
     imap(.x, ~ {     
       addWorksheet(wb, .y)
       writeData(wb, .y, .x)
       setColWidths(wb, .y, cols = 1:ncol(.x), widths = "auto")
      })

saveWorkbook(wb, file.path("C:/Users/",
                sprintf("subject_%s.xlsx", .y)))
                }
  )

Update:

if the df in sublist contain sth like this:

I just update my post. I tried to make `cols = 1:ncol(.x)` to work. — Stataq, May 26 '21 at 14:03

Anoushiravan R · Answer 1 · 2021-05-26T14:15:29.187

3

You can map_depth to great advantage. You just have to assign a value to .depth argument so the function will be applied as many levels deep as you specified. In order to have a nicer looking output I just made 2 modifications:

library(purrr)

map_depth(lst1, 2, ~ length(.x)) %>%
  map(~ .x %>% bind_cols())

$`101-01-101`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

$`101-02-102`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

$`101-03-103`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

$`101-04-104`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

$`104-05-201`
# A tibble: 1 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3

Or this one. However the output is not quite informative.

map_depth(lst1, 2, ~ length(.x)) %>%
  map(~ .x %>% bind_cols()) %>%
  exec(rbind, !!!.)

# A tibble: 5 x 2
  Demographics DiseaseStatus
*        <int>         <int>
1            3             3
2            3             3
3            3             3
4            3             3
5            3             3

edited May 26 '21 at 14:15

answered May 26 '21 at 14:01

Anoushiravan R

21,622
3
18
41

I just update my post. Is it a way to make this work for the part `cols = 1:ncol(.x)`. where ncol(.x) is the part that your get from `map_lengh`. but I am not sure how to loop this in – Stataq May 26 '21 at 14:06
You would like to know the number of cols in each underlying tibble? – Anoushiravan R May 26 '21 at 14:09
Yew. as I need that info to make `setColWidths` works. – Stataq May 26 '21 at 14:11
I understand. If I'm not mistaken we discussed it last night. Unfortunately I'm not quite familiar with this package you are using. – Anoushiravan R May 26 '21 at 14:13
It's ok, but I guess these solutions also get your desired results. – Anoushiravan R May 26 '21 at 14:20

AnilGoyal · Accepted Answer · 2021-05-26T14:27:08.413

2

As of now I am unable to understand your code, but this should be re-written as. Not sure what you want where I have put a blank line

imap ( ~ { 
   wb = createWorkbook()
     imap(.x, function(a, b) {     
       addWorksheet(wb, b)
       writeData(wb, b, a)
       setColWidths(wb, b, cols = 1:ncol(a), widths = "auto")
      })

saveWorkbook(wb, file.path("C:/Users/",
                sprintf("subject_%s.xlsx", _________)))
                }
  )

Actually you have two problems there -

invisible function inside imap_* require two arguments.
Now your another problem is to write one lambda function inside another. That's an issue I have not solved till date.

Your earlier written expression can be correctly written as

imap(lst1, function(.x, .y) imap(.x, function(xy, yz) print(ncol(xy))))

[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
[1] 3
$`101-01-101`
$`101-01-101`$Demographics
[1] 3

$`101-01-101`$DiseaseStatus
[1] 3


$`101-02-102`
$`101-02-102`$Demographics
[1] 3

$`101-02-102`$DiseaseStatus
[1] 3


$`101-03-103`
$`101-03-103`$Demographics
[1] 3

$`101-03-103`$DiseaseStatus
[1] 3


$`101-04-104`
$`101-04-104`$Demographics
[1] 3

$`101-04-104`$DiseaseStatus
[1] 3


$`104-05-201`
$`104-05-201`$Demographics
[1] 3

$`104-05-201`$DiseaseStatus
[1] 3

Alternatively, if you want something else

imap_dfr(lst1, ~ .x %>% as.data.frame() %>% ncol())
# A tibble: 1 x 5
  `101-01-101` `101-02-102` `101-03-103` `101-04-104` `104-05-201`
         <int>        <int>        <int>        <int>        <int>
1            6            6            6            6            6

Or this?

map_df(lst1, ~map(.x, function(xy) ncol(xy)))
# map_df(lst1, ~map(.x, ncol)) ##alternative
# A tibble: 5 x 2
  Demographics DiseaseStatus
         <int>         <int>
1            3             3
2            3             3
3            3             3
4            3             3
5            3             3

edited May 26 '21 at 14:27

answered May 26 '21 at 14:02

AnilGoyal

25,297
4
27
45

`ncol` that I want should be sth you got from 2nd part. How to put this part into my old codes. should I update it to `ncol(.x.y)`? – Stataq May 26 '21 at 14:10
see first part of my edited answer, @Stataq – AnilGoyal May 26 '21 at 14:13
I know` imap` sort of equal `Map`. If I rewrite it as `Map`, will that help? I am new to both them , so not sure how to move further. – Stataq May 26 '21 at 14:15
What is `xy`, `yz` stands for? – Stataq May 26 '21 at 14:18
2

These are arbitrary names of arguments, you may chose yourself. But remember, you'll use only first one from these (in the instant case). Actually `imap_*(list, ~{ .x * .y })` is equivalent to `imap_*(list, function(.x, .y) { .x * .y })` or `imap_*(list, function(a, b) { a * b })` – AnilGoyal May 26 '21 at 14:23
I think the problem is from my list. some of df in sublist is empty. Is it possible to run ` setColWidths(wb, b, cols = 1:ncol(a), widths = "auto")` only if a is not empty? – Stataq May 26 '21 at 14:57
try using unvectorised `if` – AnilGoyal May 26 '21 at 15:01
Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232918/discussion-between-stataq-and-anilgoyal). – Stataq May 26 '21 at 15:12

how to define a df in a sublist using imap

2 Answers2