1

I am making frequency tables (from a long list of genetic sequences) of the number of times genes appear in my sequencing samples. I've been using the ftable() function just fine, but I am narrowing my search and want to focus on a few specific genes out of the many thousands.

My worflow currently is as follows:

  1. I create frequency tables for all genes in a sample.
  2. Export this table to a csv.
  3. Use control+f in Excel to pull out the specific gene frequencies I'm interested in.

This seems very inefficient given the number of samples I am planning to analyze.

Is there a way to use R to extract certain entries in the frequency table?

So far I have tried the [c(, , ,)] and a[, , ,] methods to no avail. I've been getting an "unexpected symbol" error.

I'm hoping the issue isn't the hyphen in the gene name because I can't remove that.

I attached a screenshot of my R window for reference.

Here's a screenshot of my R window Here are the frequency table attributes

structure(list(Sequence.number = c(1L, 2L, 4L, 5L, 6L, 7L, 10L, 11L, 13L, 14L), Variable = structure(c(25L, 2L, 22L, 19L, 19L, 19L, 7L, 1L, 25L, 19L), .Label = c("V1-13", "V1-18", "V1-2", "V1-21", "V1-36", "V1-39", "V1-42", "V10D-9", "V11-25", "V12D-36", "V12D-56", "V15D-54", "V1D-15", "V1D-73", "V3-20", "V3D-30", "V4D-24", "V4D-43", "V4D-60", "V6-31", "V6-35", "V6-4", "V6D-40", "V6D-76", "V8-30", "V8-46", "V8-5", "V9-15", "V9-23", "V9D-2" ), class = "factor"), Diversity = structure(c(13L, 17L, 2L, 5L, 3L, 5L, 2L, 14L, 13L, 15L), .Label = c("", "D1", "D1T1", "D1T2", "D2", "D2D", "D2T1", "D2T1D", "D2T2", "D3", "D3T1", "D3T1D", "D4", "D4T1D", "D5", "D5T1D", "D6"), class = "factor"), Joining = structure(c(1L, 7L, 8L, 8L, 4L, 8L, 1L, 9L, 1L, 8L), .Label = c("J1", "J1T1", "J1T2", "J2", "J2D", "J2T1", "J3", "J4", "J5", "J6D"), class = "factor")), row.names = c(NA, 10L), class = "data.frame")

Molly F
  • 11
  • 3
  • One other option is that you simply convert that into a data.frame, df = as.data.frame(WeekZero) ; df[df[,1]=="V11-25",] – StupidWolf Feb 24 '20 at 17:10
  • I'll add a screenshot of the attributes(tab) now! – Molly F Feb 24 '20 at 17:27
  • Hey Molly, don't screen shot. Ok I know how to solve your problem. You need to do this, dput(head(WeekZero,10)), and in your R console, something will pop up like structure = .. . Copy and paste that text, and place it as part of your post – StupidWolf Feb 24 '20 at 17:29
  • This way, I can get the exact names – StupidWolf Feb 24 '20 at 17:29
  • Will do. The attributes of the original table (before calculating frequencies) aren't helpful because they don't make it through the sequence id numbers before R hits max print. – Molly F Feb 24 '20 at 17:32
  • Posted the dput(head(WeekZero,10)) output above. This is the raw data before frequencies are calculated. – Molly F Feb 24 '20 at 17:36

1 Answers1

0

Using what you have above as example:

As you have correctly done, you table the 3 variables, below I use with so that you don' need to repeat WeekZero$ 3 times:

Freq = with(WeekZero,table(Variable,Diversity,Joining))

We look at the table, and it goes on:

ftable(Freq)
                   Joining J1 J1T1 J1T2 J2 J2D J2T1 J3 J4 J5 J6D
Variable Diversity                                              
V1-13                       0    0    0  0   0    0  0  0  0   0
         D1                 0    0    0  0   0    0  0  0  0   0
         D1T1               0    0    0  0   0    0  0  0  0   0
         D1T2               0    0    0  0   0    0  0  0  0   0
         D2                 0    0    0  0   0    0  0  0  0   0
         D2D                0    0    0  0   0    0  0  0  0   0
         D2T1               0    0    0  0   0    0  0  0  0   0

To get say counts from "V1-13", you go back to the table object, specify it before the first comma, this refers to the "z" dimension of the array:

Freq["V1-13",,]

To get V1-13 and Diversity being D1, you go for the row which is the next comma:

Freq["V1-13","D1",]
  J1 J1T1 J1T2   J2  J2D J2T1   J3   J4   J5  J6D 
   0    0    0    0    0    0    0    0    0    0 

To get V1-13 and Joining == J1:

Freq["V1-13",,"J1"]
         D1  D1T1  D1T2    D2   D2D  D2T1 D2T1D  D2T2    D3  D3T1 D3T1D    D4 
    0     0     0     0     0     0     0     0     0     0     0     0     0 
D4T1D    D5 D5T1D    D6 
    0     0     0     0 
StupidWolf
  • 45,075
  • 17
  • 40
  • 72
  • I apologize, I didn't explain my question correctly. I'll edit the post. After I create the frequency table, I merge the Variable, Diversity, and Joining columns in Excel (probably also not the best way, but it takes two seconds) into one column titled Family. I am interested in specific frequencies (Variable, Diversity, and Joining combinations) of this merged column – Molly F Feb 24 '20 at 17:46
  • Yup, you want say "V1-13 + J1 + D5" right? so if you go do Freq["V1-13","D5","J1"], you get the exact frequency of this VDJ combination :) – StupidWolf Feb 24 '20 at 17:48