7

I have a below-mentioned dataframe:

DF (dput):

structure(list(ID = c("P-1", " P-1", 
    "P-1", "P-2", "P-3", "P-4", 
    "P-5", "P-6", "P-7", "P-8"
    ), Date = c("2020-03-16 12:11:33", "2020-03-16 13:16:04", 
    "2020-03-16 06:13:55", "2020-03-16 10:03:43", "2020-03-16 12:37:09", 
    "2020-03-16 06:40:24", "2020-03-16 09:46:45", "2020-03-16 12:07:44", 
    "2020-03-16 14:09:51", "2020-03-16 09:19:23"), Status = c("SA", 
    "SA", "SA", "RE", "RE", "RE", 
    "RE", "XA", "XA", "XA"), Flag2 = c("L", 
    "L","L", NA, "K", "J", NA, NA, "H", "G"), Value = c(5929.81, 
    5929.81, 5929.81, NA, 6969.33, 740.08, NA, NA, 1524.8, NA), Flag2 = c("CL", 
    "CL", "CL", NA, "RY", "", NA, NA, "", NA), Flag3 = c(NA, NA, 
    NA, NA, "RI", "PO", NA, "SS", "DDP", NA)), .Names = c("ID", 
    "Date", "Status", "Flag", "Value", 
    "Flag2", "Flag3"), row.names = c(NA, 10L), class = "data.frame")

I am using below-mentioned code:

    df %>% mutate(L = ifelse(Flag == "L",1,0),
                  K = ifelse(Flag == "K",1,0),
                  # etc for Flag) %>%
      mutate(sub_status = NA) %>%
      mutate(sub_status = ifelse(!is.na(Flag2) & Flag3 == 0, "a", sub_status),
             sub_status = ifelse(is.na(Flag2) & Flag3 != 0, "b", sub_status),
             # etc for sub-status) %>%
      mutate(value_class = ifelse(0 <= Value & Value <= 15000, "0-15000", "15000-50000")) %>%
      group_by(Date, status, sub_status, value_class) %>%
      summarise(L = sum(L),
                K = sum(K),
                # etc
                count = n())

Which provides me the following output:

    Date         Status  sub_status   value_class G H I J K L NA Count
    2020-03-20   SA      a            0-15000     0 0 0 0 1 1 0  2
    2020-03-20   SA      b            0-15000     0 0 0 0 1 0 0  1
    ................
    ................

I want to get the following output using the DF, where the Status column has distinct 3 values and Flag2 has either values or [null] or NA and finally Flag3 column has distinct 7 values with [null] or NA. For one distinct ID we have multiple entry of Flag3 column.

I Need to create the following dataframe, by creating a 3 group based on Value like 0-15000, 15000-50000.

  • If for a distinct ID Flag2 has some value other than 0 or [null]/NA but Flag3 has value 0 or [null]/NA then it would be a.
  • If for a distinct ID Flag3 has some value other than 0 or [null]/NA but Flag2 has value 0 or [null]/NA then it would be b
  • If for a distinct ID both Flag2 & Flag3 has some value other than 0 or [Null]/NA then it would be c
  • If for a distinct ID both Flag2 & Flag3 has value 0 or [Null]/NA the it would be d

The count will be a distinct group by ID. Also, for the first-row (Where the Status i.e SA, RE etc starts) percentage will get calculated by the Total and below the percentage for sub_status (i.e a,b,c etc) get calculated by the count of status.

I want to arrange the above mentioned datafrmae in the following structure with percent and Total column in html table format which can be attached in an email using mailR library.

I'm using the code:

    library(tableHTML)

    html_table<-df %>% tableHTML(rownames = FALSE,
                                      widths = rep(120, 10),
                                      caption = "XYZ (System Date)") %>%
      add_css_caption(css = list(c("font-weight", "border","font-size"),
                                 c("bold", "1px solid black","16px")))%>% 
      add_css_row(css = list(c("background-color"), c("lightblue")), rows = 0:1)%>%
      add_css_caption(css = list(c("background-color"), c("lightblue")))
    20/03/2020                 0 - 15000                         15000 - 50000
Status      count   percent  L K J H G [Null] count   percent  L K J H G [Null]   Total
SA            2      2/4     1 1 0 0 0   1      2              0 0 1 1 0    0       4
a             1      1/2     0 1 0 0 0   0      0              0 0 0 0 0    0       1
b             0       -      0 0 0 0 0   0      0              0 0 0 0 0    0       0
c             1      1/2     1 0 0 0 0   0      2              0 0 1 1 0    0       3
d             0       -      0 0 0 0 0   0      0              0 0 0 0 0    0       0
RE            2      2/5     0 0 0 0 1   1      0              0 0 0 0 0    0       2
a             1              0 0 0 0 1   0      0              0 0 0 0 0    0       1
b             1              0 0 0 0 0   1      0              0 0 0 0 0    0       1
c             0              0 0 0 0 0   0      0              0 0 0 0 0    0       0
d             0              0 0 0 0 0   0      0              0 0 0 0 0    0       0
XA            1              0 0 0 0 0   1      0              0 0 0 0 0    0       1
a             0              0 0 0 0 0   0      0              0 0 0 0 0    0       0
b             0              0 0 0 0 0   0      0              0 0 0 0 0    0       0
c             0              0 0 0 0 0   0      0              0 0 0 0 0    0       0
d             1              0 0 0 0 0   1      0              0 0 0 0 0    0       1
Total         5              1 1 0 0 1   3      2              0 0 1 1 0    0       7

I have mentioned the required output based on the latest date which is 20/03/2020, if the dataframe don't have the latest date as per startdate keep all the value 0 in the output dataframe. The percentage column is just for the reference there will be calculated percentage values.

Also, I want to keep the structure static. For Example, if any of the parameters are not present for a day the output structure would be the same with 0 value.

For Example, Suppose date 17/03/2020 doesn't have any row with status SA or sub_status c the place holder for that will be there in the output with value as 0.

Note: The desired output is just for the format perspective. The numbers might not match with dput df.

stefan
  • 90,330
  • 6
  • 25
  • 51
user9211845
  • 131
  • 1
  • 12
  • instead of using the R package `tableHTML`, can you use another R package, like `gt`? – Fred Boehm Apr 13 '20 at 17:53
  • what's `reps_rt`? it's not in the original `df` – GGamba Apr 15 '20 at 10:34
  • I'm not sure I understand what you mean exactly, by >>> `The count will be a distinct group by ID.` you mean the ID column in your sample data? because you didn't use it in the group_by – DS_UNI Apr 17 '20 at 05:57
  • @DS_UNI: My approach my not be correct because as you may see in sample df there are samp ids multiple time because of column `Flag3`. Though, in my output dataframe I want them to count as distinct that's why I said group by I'd. – user9211845 Apr 17 '20 at 14:35
  • Do you mean that the table you provided as an expected output belongs to one ID? or should we ignore the ID altogether? In your sample data the only difference between the first three rows (i.e. ID='P-1') is the Date column, which is actually a timestamp, so all three rows would be just one if we consider the Date only (and no time). Can you provide an expected output that matches the provided sample (or provide a sample that matches the output), otherwise anything – DS_UNI Apr 17 '20 at 16:35
  • @DS_UNI: Yes, you are correct. Infact one simple solution could be wherever the `Flag3` is [null] or blank consider it 0 else consider it 1. – user9211845 Apr 17 '20 at 17:46
  • @DS_UNI: Did you understand, Please let me know if you can help with the possible solution. – user9211845 Apr 17 '20 at 22:36
  • 1
    Please could you edit the question and replace the mutate and summarise etc... part with your actual code – pietrodito Apr 18 '20 at 17:54
  • 1
    And how is this different from the other question: https://stackoverflow.com/q/61145404/11355066 – Cole Apr 18 '20 at 18:12
  • 1
    @Cole: I'm trying to restructure the dataframe in html itself here. – user9211845 Apr 18 '20 at 18:38
  • 1
    Then this question should just include the intended output of the other question. This one is confusing as is and includes 90% of the text of the other one – Cole Apr 18 '20 at 18:55
  • @Cole: ok, I didn't know that before. – user9211845 Apr 18 '20 at 18:57
  • @Cole: Let me know if the output is possible with R. – user9211845 Apr 19 '20 at 00:51
  • It is probably doable but it's hard to help. The code you have is truncated so we cannot copy and paste to try ourselves and then it appears the final output does not match the input values. – Cole Apr 19 '20 at 03:21
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/211991/discussion-between-user9211845-and-cole). – user9211845 Apr 19 '20 at 03:34

0 Answers0