I have fetched some information form the MySQL server into R which looks like as follows in my R dataframe:
barcode_no Inspection_date current_profile score Tag_log prod_log
12345678 2020-01-15 14:34:13 Large 10 C1 WIP
12345678 2020-01-15 18:33:11 Medium 20 C2 Hold
12345678 2020-01-15 13:23:24 Medium 50 C3 Hold
12345678 2020-01-15 12:12:23 Medium 70 Shipped
12345678 2020-01-15 11:12:45 Medium 120 C1 Shipped
12345678 2020-01-15 12:22:32 Small 150 C2 Shipped
12345678 2020-01-15 15:23:23 Small 10 C3 WIP
12345678 2020-01-15 16:34:08 Small 20 C2 Hold
12345678 2020-01-15 17:07:13 Small 130 C1 Hold
12345678 2020-01-15 17:09:05 Small 40 Hold
The requirement is to fit the particulars of the above-mentioned dataframe in a comprehensive report structure for date and moth wise.
comprehensive_df (Date): Will consider the latest date as per the system date if some or all the records are not available for that date then fill the comprehensive report df with 0.
Current_profile # of records % of records C1 C2 C3 [Null] # of records % of records C1 C2 C3 [Null] # of records % of records C1 C2 C3 [Null] Total % Total
**Large 01 16.67 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 10.00**
Shipped 0 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0
Hold 0 0.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.0
WIP 01 1.0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 100.00
**Small 03 50.00 0 1 1 1 0 0 0 0 0 0 02 66.67 1 1 0 0 5 50.00**
Shipped 0 0 0 0 0 0 0 0 0 0 0 0 01 50.00 0 1 0 0 1 20.00
Hold 02 66.67 0 1 0 1 0 0 0 0 0 0 1 100.00 1 0 0 0 3 60.00
WIP 01 33.33 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 20.00
**Medium 02 33.33 0 1 1 0 1 100.00 0 0 0 1 1 33.33 1 0 0 0 4 40.00**
Shipped 0 0 0 0 0 0 1 100.00 0 0 0 1 1 100.00 0 0 0 0 2 50.00
Hold 2 100.00 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 2 50.00
WIP 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Total 06 0.10 1 0 0 0 1 0 0 0 0 0 3 0 0 0 0 0 1 0.10
I have divided the comprehensive dataframe in sections, where column 2 to 7 represent the count of those have score from 0 to <=50, column 8 to 13 represent the count of those have score from >50 to 100 and column 14 to 20 represent the count of those have score from >100.
The code that I'm trying:
df1<- df %>%
mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
group_by(Month) %>%
dplyr::summarise(`current_profile` = n())
df2<- df %>%
mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
group_by(Month) %>%
dplyr::summarise(`Tag_log` = n())
df3<- df %>%
mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
group_by(Month) %>%
dplyr::summarise(`prod_log` = n())
And so on for every variable. And then trying to full_join
all the dataframe by Date
for Date wise comprehensive format and month for month wise comprehensive format.
comprehensive_df <- df1 %>% full_join(df1, by = 'Month') %>%
full_join(df2, by = 'Month') %>%
full_join(df3, by = 'Month')