I have created a summarized data frame using R's 'summarize' function, including two factors - "Size of Firm" & "Case Status" - and number of records (n) for each combination of "Size of Firm" and "Case Status". There are three levels for size of firm and four levels for Case Status, so I have 12 total rows in this summarized data frame. Here is the script for the summarized data frame (including the preceding 'group by' function):
df <- group_by(df, df$Firm.Size, df$`Case Status`)
summ_firm <- summarize(df, num_records = n())
I want to create a new column in the summarized data frame that provides a proportion of an individual row's number of records (i.e. number of records for a given combination of "Firm Size" and "Case Status") with respect to the total records for the relevant Firm Size.
In other words, if "Small Firms" have a total of 100 records and the row containing records for "Small Firms" that were "Certified" (level of case status) has 20 records, I would want this new column to populate with 0.2 for that row.
Here is the actual output of 'summ_firm' referenced earlier in the post.
`df_nona_firm$Firm.Size` `df_nona_firm$\`Case Status\`` num_records
<fct> <fct> <int>
1 0-99 Employees Certified 32565
2 0-99 Employees Certified-Expired 24493
3 0-99 Employees Denied 6346
4 0-99 Employees Withdrawn 3155
5 1,000+ Employees Certified 63649
6 1,000+ Employees Certified-Expired 51981
7 1,000+ Employees Denied 3532
8 1,000+ Employees Withdrawn 4078
9 100-999 Employees Certified 24752
10 100-999 Employees Certified-Expired 19095
11 100-999 Employees Denied 2830
12 100-999 Employees Withdrawn 2537