6

I have fetched some information form the MySQL server into R which looks like as follows in my R dataframe:

barcode_no   Inspection_date        current_profile      score    Tag_log   prod_log
12345678     2020-01-15 14:34:13    Large                10       C1        WIP
12345678     2020-01-15 18:33:11    Medium               20       C2        Hold
12345678     2020-01-15 13:23:24    Medium               50       C3        Hold
12345678     2020-01-15 12:12:23    Medium               70                 Shipped
12345678     2020-01-15 11:12:45    Medium               120      C1        Shipped
12345678     2020-01-15 12:22:32    Small                150      C2        Shipped
12345678     2020-01-15 15:23:23    Small                10       C3        WIP
12345678     2020-01-15 16:34:08    Small                20       C2        Hold
12345678     2020-01-15 17:07:13    Small                130      C1        Hold
12345678     2020-01-15 17:09:05    Small                40                 Hold 

The requirement is to fit the particulars of the above-mentioned dataframe in a comprehensive report structure for date and moth wise.

comprehensive_df (Date): Will consider the latest date as per the system date if some or all the records are not available for that date then fill the comprehensive report df with 0.

Current_profile     # of records  % of records C1 C2 C3 [Null] # of records  % of records C1 C2 C3 [Null] # of records  % of records C1 C2 C3 [Null] Total    % Total
**Large               01            16.67        1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        10.00**
Shipped             0             0.0          0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0.0
Hold                0             0.0          0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0.0
WIP                 01             1.0         1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        100.00
**Small               03            50.00        0  1  1  1      0             0            0   0  0    0     02             66.67        1  1  0   0     5        50.00**
Shipped             0             0            0  0  0  0      0             0            0   0  0    0     01             50.00        0  1  0   0     1        20.00
Hold                02            66.67        0  1  0  1      0             0            0   0  0    0      1             100.00       1  0  0   0     3        60.00
WIP                 01            33.33        1  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     1        20.00
**Medium              02            33.33        0  1  1  0      1             100.00       0   0  0    1      1             33.33        1  0  0   0     4        40.00**
Shipped             0              0           0  0  0  0      1             100.00       0   0  0    1      1             100.00       0  0  0   0     2        50.00
Hold                2            100.00        0  1  1  0      0             0            0   0  0    0      0             0            0  0  0   0     2        50.00
WIP                 0            0             0  0  0  0      0             0            0   0  0    0      0             0            0  0  0   0     0        0
Total               06            0.10         1  0  0  0      1             0            0   0  0    0      3             0            0  0  0   0     1        0.10

I have divided the comprehensive dataframe in sections, where column 2 to 7 represent the count of those have score from 0 to <=50, column 8 to 13 represent the count of those have score from >50 to 100 and column 14 to 20 represent the count of those have score from >100.

The code that I'm trying:

df1<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`current_profile` = n())

df2<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`Tag_log` = n())

df3<- df %>%
  mutate(Month = format(ymd(Inspection_date),'%b-%Y')) %>%
  group_by(Month) %>%
  dplyr::summarise(`prod_log` = n())

And so on for every variable. And then trying to full_join all the dataframe by Date for Date wise comprehensive format and month for month wise comprehensive format.

comprehensive_df <- df1 %>% full_join(df1, by = 'Month') %>% 
                      full_join(df2, by = 'Month') %>%
                      full_join(df3, by = 'Month')
Mike
  • 61
  • 3
  • 2
    What code do you have so far? [gt](https://blog.rstudio.com/2020/04/08/great-looking-tables-gt-0-2/) may be helpful. – alistaire Apr 15 '20 at 02:49
  • @alistaire: I'm a very beginner at using R. I have tried `dplyr` and `tidyverse` library to achieve the comprehensive dataframe but so far just able to aggregate the count by `Inspection_Date` and `Current_profile` only. Not sure how to aggregate with `Current_profile` and `prod_log` together, – Mike Apr 15 '20 at 02:59
  • Group by more than one variable: `my_df %>% group_by(Current_profile, prod_log, Tag_log) %>% summarise(n_records = n())`. That won't give you the presentation format you're asking for, but the format it will give you is more useful for further analysis because it's [tidy](http://vita.had.co.nz/papers/tidy-data.pdf). – alistaire Apr 15 '20 at 03:09
  • @alistaire: Yes, but need to create it in this format so that it would be much more visually clear to present the data for monthly report. – Mike Apr 15 '20 at 03:12
  • @alistaire: To be honest, this is some manual task that I want to automate as it requires a significant time to export the data using `write.csv()` and arrange it in excel in the given format. – Mike Apr 15 '20 at 03:15
  • Sure, that's possible. But if you want help, you need to make an effort and provide the code where you get stuck; you can't ask here for someone else to come up with an approach and write it all for you. But we will help you make your code work. – alistaire Apr 15 '20 at 03:33
  • @alistaire: Thanks, updated the question with my approach. – Mike Apr 15 '20 at 04:30
  • @alistaire: Please suggest, does the approach seems ok to you. – Mike Apr 15 '20 at 16:39

1 Answers1

0

I am not sure I understand what you need but maybe something like this?

library(magrittr)
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following object is masked from 'package:base':
#> 
#>     date

dat <- tibble::tribble(
  ~barcode_no, ~Inspection_date, ~current_profile, ~score, ~Tag_log, ~prod_log,
    12345678L,     "15/01/2020",          "Large",    10L,     "C1",     "WIP",
    12345678L,     "15/01/2020",         "Medium",    20L,     "C2",    "Hold",
    12345678L,     "15/01/2020",         "Medium",    50L,     "C3",    "Hold",
    12345678L,     "15/01/2020",         "Medium",    70L,       NA, "Shipped",
    12345678L,     "15/01/2020",         "Medium",   120L,     "C1", "Shipped",
    12345678L,     "15/01/2020",          "Small",   150L,     "C2", "Shipped",
    12345678L,     "15/01/2020",          "Small",    10L,     "C3",     "WIP",
    12345678L,     "15/01/2020",          "Small",    20L,     "C2",    "Hold",
    12345678L,     "15/01/2020",          "Small",   130L,     "C1",    "Hold",
    12345678L,     "15/01/2020",          "Small",    40L,       NA,    "Hold"
  )



dat$Inspection_date = as.Date(dat$Inspection_date,format = "%d/%m/%Y")

today = Sys.Date()

param_date = as.Date("15/01/2020",format = "%d/%m/%Y")

dat$month = format(ymd(dat$Inspection_date),'%b-%Y')

dat$score_group = dplyr::case_when(
  dat$score <= 50 ~ "low",
  dat$score < 100 ~ "med",
  TRUE ~ "high"
)

dat %>% dplyr::filter(Inspection_date >= param_date) %>%
  dplyr::group_by(current_profile, month, score_group, Tag_log,prod_log) %>% 
  dplyr::summarise(count = dplyr::n()) %>% 
  tidyr::pivot_wider(names_from = c("score_group","Tag_log"),
                     values_from = count,
                     values_fill  = list(count = 0)) -> res_dat


knitr::kable(res_dat,format = "markdown")
|current_profile |month    |prod_log | low_C1| high_C1| low_C2| low_C3| med_NA| high_C2| low_NA|
|:---------------|:--------|:--------|------:|-------:|------:|------:|------:|-------:|------:|
|Large           |Jan-2020 |WIP      |      1|       0|      0|      0|      0|       0|      0|
|Medium          |Jan-2020 |Shipped  |      0|       1|      0|      0|      1|       0|      0|
|Medium          |Jan-2020 |Hold     |      0|       0|      1|      1|      0|       0|      0|
|Small           |Jan-2020 |Hold     |      0|       1|      1|      0|      0|       0|      1|
|Small           |Jan-2020 |Shipped  |      0|       0|      0|      0|      0|       1|      0|
|Small           |Jan-2020 |WIP      |      0|       0|      0|      1|      0|       0|      0|
David Mas
  • 1,149
  • 2
  • 12
  • 18
  • The output doesn't match with the expected output, as the expected output has those rows as well which has 0 counts. – Vector JX Apr 21 '20 at 22:56