I have a below-mentioned dataframe:
DF (dput):
structure(list(ID = c("P-1", " P-1",
"P-1", "P-2", "P-3", "P-4",
"P-5", "P-6", "P-7", "P-8"
), Date = c("2020-03-16 12:11:33", "2020-03-16 13:16:04",
"2020-03-16 06:13:55", "2020-03-16 10:03:43", "2020-03-16 12:37:09",
"2020-03-16 06:40:24", "2020-03-16 09:46:45", "2020-03-16 12:07:44",
"2020-03-16 14:09:51", "2020-03-16 09:19:23"), Status = c("SA",
"SA", "SA", "RE", "RE", "RE",
"RE", "XA", "XA", "XA"), Flag2 = c("L",
"L","L", NA, "K", "J", NA, NA, "H", "G"), Value = c(5929.81,
5929.81, 5929.81, NA, 6969.33, 740.08, NA, NA, 1524.8, NA), Flag2 = c("CL",
"CL", "CL", NA, "RY", "", NA, NA, "", NA), Flag3 = c(NA, NA,
NA, NA, "RI", "PO", NA, "SS", "DDP", NA)), .Names = c("ID",
"Date", "Status", "Flag", "Value",
"Flag2", "Flag3"), row.names = c(NA, 10L), class = "data.frame")
I am using below-mentioned code:
df %>% mutate(L = ifelse(Flag == "L",1,0),
K = ifelse(Flag == "K",1,0),
# etc for Flag) %>%
mutate(sub_status = NA) %>%
mutate(sub_status = ifelse(!is.na(Flag2) & Flag3 == 0, "a", sub_status),
sub_status = ifelse(is.na(Flag2) & Flag3 != 0, "b", sub_status),
# etc for sub-status) %>%
mutate(value_class = ifelse(0 <= Value & Value <= 15000, "0-15000", "15000-50000")) %>%
group_by(Date, status, sub_status, value_class) %>%
summarise(L = sum(L),
K = sum(K),
# etc
count = n())
Which provides me the following output:
Date Status sub_status value_class G H I J K L NA Count
2020-03-20 SA a 0-15000 0 0 0 0 1 1 0 2
2020-03-20 SA b 0-15000 0 0 0 0 1 0 0 1
................
................
I want to get the following output using the DF
, where the Status
column has distinct 3 values and Flag2
has either values or [null] or NA and finally Flag3
column has distinct 7 values with [null] or NA. For one distinct ID
we have multiple entry of Flag3
column.
I Need to create the following dataframe, by creating a 3 group based on Value
like 0-15000, 15000-50000.
- If for a distinct ID
Flag2
has some value other than 0 or [null]/NA butFlag3
has value 0 or [null]/NA then it would bea
. - If for a distinct ID
Flag3
has some value other than 0 or [null]/NA butFlag2
has value 0 or [null]/NA then it would beb
- If for a distinct ID both
Flag2
&Flag3
has some value other than 0 or [Null]/NA then it would bec
- If for a distinct ID both
Flag2
&Flag3
has value 0 or [Null]/NA the it would bed
The count will be a distinct group by ID
. Also, for the first-row (Where the Status
i.e SA, RE etc starts) percentage
will get calculated by the Total
and below the percentage for sub_status (i.e a,b,c etc) get calculated by the count of status.
I want to arrange the above mentioned datafrmae in the following structure with percent
and Total
column in html table format which can be attached in an email using mailR library.
I'm using the code:
library(tableHTML)
html_table<-df %>% tableHTML(rownames = FALSE,
widths = rep(120, 10),
caption = "XYZ (System Date)") %>%
add_css_caption(css = list(c("font-weight", "border","font-size"),
c("bold", "1px solid black","16px")))%>%
add_css_row(css = list(c("background-color"), c("lightblue")), rows = 0:1)%>%
add_css_caption(css = list(c("background-color"), c("lightblue")))
20/03/2020 0 - 15000 15000 - 50000
Status count percent L K J H G [Null] count percent L K J H G [Null] Total
SA 2 2/4 1 1 0 0 0 1 2 0 0 1 1 0 0 4
a 1 1/2 0 1 0 0 0 0 0 0 0 0 0 0 0 1
b 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 1 1/2 1 0 0 0 0 0 2 0 0 1 1 0 0 3
d 0 - 0 0 0 0 0 0 0 0 0 0 0 0 0 0
RE 2 2/5 0 0 0 0 1 1 0 0 0 0 0 0 0 2
a 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1
b 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
XA 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1
a 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d 1 0 0 0 0 0 1 0 0 0 0 0 0 0 1
Total 5 1 1 0 0 1 3 2 0 0 1 1 0 0 7
I have mentioned the required output based on the latest date which is 20/03/2020, if the dataframe don't have the latest date as per startdate
keep all the value 0 in the output dataframe. The percentage column is just for the reference there will be calculated percentage values.
Also, I want to keep the structure static. For Example, if any of the parameters are not present for a day the output structure would be the same with 0 value.
For Example, Suppose date 17/03/2020 doesn't have any row with status SA or sub_status c the place holder for that will be there in the output with value as 0.
Note: The desired output is just for the format perspective. The numbers might not match with dput
df.