I know there are so many threats answering similar questions to mine but none of the answers out there are specific enough to what I want to obtain.
I've got the following dataset:
I want to count the number of patients (found in "Var_name") that harbour each mutation (found in "var_id") and display the count in a new column ("var_freq"). I've tried things like:
y <- ALL_merged %>%
group_by(var_id, Var_name) %>%
summarise(n_counts = n(), var_freq = sum(var_id == Var_name))
NOTE: In case is relevant for the answers... I had to convert "var_id" and "Var_name" into characters to make this work because they were factors.
However, this does not give me the output I want. Instead, I get the count of how many times each "var_id" appear per patient since, for each "var_id", the same "Var_name" appears a lot of times (because rows contain additional columns with different information), so the final outcome gives me a higher count that I would expect:
I also want to add this new column to the original dataset, I believe this could be done for example by using "mutate". But not sure how to set up everything...
So, in sum, what I want to get is: for each "var_id" how many different "Var_name" I have - taking into account that these data is duplicated...
Thanks in advance!