1

I have a huge metadata file with 79 Columns and 78687 Rows. This metadata is from our cancer experiment results. I am using dplyr to query the cell counts for each sample in that metadata.

I have 16 samples:

Sample list(Var1),

I need to find the cell counts for each condition (Tumor or Normal or MSS_Status) in each sample. I am doing it individually so for as follows

dim(meta %>% filter(Condition == "Tumor" & MSI_Status=="MSS" & Location =="Left" & orig.ident == "B_cac10"));

# 689  24

I am sure there is an intelligent way to do it, how can I loop this to get an answer in one go?

P.S: I am a Biologist and my knowledge is very limited in Looping or coding

EDIT: 1

reproducible Example

df <- data.frame(Condition = c("Normal","Normal","Normal","Tumor","Tumor","Tumor"),
                 MSI_Status = c("High", "High", "High", "Low", "Low", "Low"),
                 Location = c("Lungs", "Lungs", "Lungs", "Kidney", "Kidney", "Liver"), 
                 Clusters = c(1,2,4,2,2,6), 
                 orig.ident = c("B-cac10","B-cac11","T-cac15","B-cac15","B-cac19","T-cac22"))

My Codes:

df %>% filter(Condition == "Tumor" & MSI_Status=="Low" & Location
=="Kidney" & orig.ident == "B-cac15")

Expected results:

Each orig.idents counts should be given under Condition "Tumor ", MSI_Status=="Low" & Location = "Kidney"

Thanks a lot for your Help, Stay Safe. Dave

Nate
  • 10,361
  • 3
  • 33
  • 40
David
  • 99
  • 1
  • 1
  • 9
  • 1
    can you edit and add a `dput(your_data)` sample as a code chunk? – Nate Sep 05 '20 at 13:24
  • 1
    Thanks a lot Nate, i have Edited the question Please have a look. – David Sep 05 '20 at 13:45
  • 1
    Use the `{ }` button intead of the `"` button to format code in the SO editor. If you want to use markdown directly you can use ``` (3 backticks) as the open&close tag to surround the code – Nate Sep 05 '20 at 13:52
  • Thanks Nate, i will keep that in mind. – David Sep 05 '20 at 14:08

1 Answers1

1

You can use the dplyr function filter to subset the data based on your criteria. Then you can use the dplyr count function to count the unique values in orig.ident. As alluded to in the comments, you can opt to set name = Freq from within this function. I opted to use the rename function instead to be as explicit as possible since you are new to R.

Data

df <- data.frame(Condition = 
c("Normal","Normal","Normal","Tumor","Tumor","Tumor"), MSI_Status = 
c("High", "High", "High", "Low", "Low", "Low"), Location = c("Lungs", 
"Lungs", "Lungs", "Kidney", "Kidney", "Liver"), Clusters = 
c(1,2,4,2,2,6), orig.ident=c("B-cac10","B-cac11","T-cac15","B- 
cac15","B-cac19","T-cac22"))

Code

library(dplyr)

df %>% 
  filter(Condition == "Tumor" & 
         MSI_Status == "Low" & 
         Location == "Kidney") %>% 
  count(orig.ident) %>% 
  rename(Freq = n)

#>   orig.ident Freq
#> 1    B-cac15    1
#> 2    B-cac19    1

Created on 2020-09-05 by the reprex package (v0.3.0)

Eric
  • 2,699
  • 5
  • 17
  • Thanks a lot, I have added a Reproducible example. Please have a look – David Sep 05 '20 at 13:45
  • 1
    HI @David I have updated my solution which includes the addition of `filter` on the criteria you provided. – Eric Sep 05 '20 at 13:53
  • 1
    perhaps `count(orig.intent, name = 'Freq')` instead of a rename? – Nate Sep 05 '20 at 13:54
  • 1
    Given his limited exposure to coding, I thought it would be easier for @David to comprehend what is going on by breaking it out into two functions. Not the most elegant, but hopefully easier for him to grasp what the code is doing by it being a little more explicit. – Eric Sep 05 '20 at 13:56
  • Thanks a lot @iamericfletcher, I appreciate your efforts. I did tryd the codes but it shows the following error..Error in count(., orig.ident) : object 'orig.ident' not found – David Sep 05 '20 at 14:01
  • What is the name of the column you want to count? If it is `orig.ident` the code should work. If it is `Var1`, try using that instead? – Eric Sep 05 '20 at 14:06
  • "orig.ident" is working fine. Perfect! but now Freq is showing error, Freq = nError in rename(., Freq = n) : unused argument (Freq = n) – David Sep 05 '20 at 14:15
  • Get rid of the rename and edit the count function as follows count(orig.ident, name = “Freq”) – Eric Sep 05 '20 at 14:18