I have a large data frame called data_frame
with 3 columns PRE
, STATUS
, and CHR
that look like this:
PRE STATUS CHR
1_752566 GAINED 1
1_776546 LOST 1
1_832918 NA 1
1_842013 LOST 1
1_846864 GAINED 1
11_8122943 NA 11
11_8188699 GAINED 11
11_8321128 NA 11
23_95137734 NA 23
23_95146814 GAINED 23
From here I'd like to group CHR
by number and then find the sum of each group. If possible, I would like a new data table (let's call it TOTAL
) showing the sums of each group number like this:
CHR TOTAL_SUM
1 5
11 3
23 2
from here I would like to create another data table called BY_STATUS
with 3 columns CHR
, 'SUM _GAINED', 'SUM_LOST' where 'SUM_GAINED is the sum of CHR that matches with the 'STATUS' output 'GAINED' and 'SUM_LOST' is the sum of CHR that matches with the 'STATUS' output 'LOST' like this:
CHR SUM _GAINED SUM_LOST
1 2 2
11 1 0
23 1 0
I would then create two different plots:
1st plot would be for the data table TOTAL
to visualize the sums of each number where my x-axis is NUM
and my y-axis is SUM
2nd plot would be for the data table BY_STATUS
to visualize the different frequencies of each number in CHR
based on both SUM_GAINED
and SUM_LOST
where my x-axis is CHR
and my y-axis is both SUM_GAINED
and SUM_LOST
. Maybe a side-by-side comparison of the two different y-axis?