0

I have a data frame like this:

LR_ID               Proc_ID
   1                   2
   1                  10
   1                  10
   1                   2
   2                  10
   3                  10
   4                   3
   5                   3
   5                  10

The idea is to get frequency distinct LR_IDs associated with a Proc_ID.

I have counted the frequency of Proc_ID and LR_ID like this:

 library(plyr)
 count_0 <- count(my_df)

which has given me a result like this:

 LR_ID     Proc_ID      Freq
   1          0        1154
   1          1         980
   1          2        1256

and so on.. I have 20 process IDs (0 to 19) and around 800 LR_IDs so all the combinations. I want to plot this where one axis will be the process ids (0 to 19, in this case) and showing the frequency of distinct LR_IDs associated with one process id.

Jon Spring
  • 55,165
  • 4
  • 35
  • 53
Ashi
  • 61
  • 5

1 Answers1

0
library(dplyr); library(ggplot2)
my_df %>%
  count(LR_ID, Proc_ID) %>%
  ggplot(aes(LR_ID, Proc_ID, fill = n, label = n)) +
  geom_tile(alpha = 0.8) +
  geom_text() +
  scale_x_continuous(minor_breaks = NULL) +
  scale_y_continuous(minor_breaks = NULL, breaks = 1:800) +  
  theme_minimal()

enter image description here

Sample data

my_df <- tibble::tribble(
    ~LR_ID, ~Proc_ID,
        1L,       2L,
        1L,      10L,
        1L,      10L,
        1L,       2L,
        2L,      10L,
        3L,      10L,
        4L,       3L,
        5L,       3L,
        5L,      10L
    )
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
  • thank you for the answer. Your plot looks what I want but when I am trying to do I am getting this error: Error in count(., LR_ID, Proc_ID) : object 'LR_ID' not found my colnames are LR_ID and Proc_ID yet I don't know why I am getting the error. This is the output of dput(head(my_df)) : structure(list(LR_ID = c(1L, 1L, 1L, 1L, 1L, 1L), Proc_ID = c(5L, 5L, 5L, 5L, 5L, 6L)), row.names = c(1L, 5L, 8L, 10L, 12L, 13L ), class = "data.frame") – Ashi Feb 11 '21 at 23:28
  • Try loading the sample data I added at the bottom of my answer, then running the code. If it runs, I suggest double checking your column names. (e.g. you had `ProcID` in your original question but `Proc_ID` in your comment above...) – Jon Spring Feb 11 '21 at 23:33
  • I tried running with the sample data that you provided. It is still showing the same error. :( – Ashi Feb 11 '21 at 23:41
  • And you have loaded dplyr in this session and you are running the whole chain? The error makes it sound like the `count` line is running without the `my_df %>%` line preceding it, since R is looking for an object called `LR_ID` when the line before it should have fed the tibble called `my_df` into `count`. (equivalent to starting with `count(my_df, LR_ID, Proc_ID) %>%`). That should work, provided that `my_df` is loaded, and `dplyr` is loaded so that you have the `%>%` operator. – Jon Spring Feb 11 '21 at 23:51
  • @JonSmith count expects a data.frame/tibble. According to ?dplyr::count. Is that why I am getting the error? – Ashi Feb 11 '21 at 23:52
  • It is working now. I had to restart the whole session and start from scratch. Maybe some previously loaded libraries were having conflicts. Thank you again @JonSmith. – Ashi Feb 12 '21 at 00:05