0

I have a dataframe named 'res', where the row names are numbers corresponding to genes.

>res

        baseMean log2FoldChange     lfcSE      stat      pvalue        padj
       <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
27395    1268.40       0.100013  0.164840  0.606731 5.44029e-01 0.737925231
18777    1413.56      -0.266365  0.175847 -1.514758 1.29834e-01 0.312449929
21399    3376.09      -0.243707  0.132616 -1.837687 6.61086e-02 0.196027163

I am wondering how to give the row names of my dataframe the heading 'gene_id' so that my data frame ends up looking like this.


>res
gene_id baseMean log2FoldChange     lfcSE      stat      pvalue        padj
       <numeric>      <numeric> <numeric> <numeric>   <numeric>   <numeric>
27395    1268.40       0.100013  0.164840  0.606731 5.44029e-01 0.737925231
18777    1413.56      -0.266365  0.175847 -1.514758 1.29834e-01 0.312449929
21399    3376.09      -0.243707  0.132616 -1.837687 6.61086e-02 0.196027163

I am planning to bind this dataframe with another dataframe (anno) containing information of the actual genes, by the 'gene_id' column using the left_join function.

>anno
   gene_id  SYMBOL                                                                     GENENAME
1    27395  Mrpl15                                          mitochondrial ribosomal protein L15
2    18777  Lypla1                                                          lysophospholipase 1
3    21399   Tcea1                                    transcription elongation factor A (SII) 1

res_anno <- left_join(res, anno,by="gene_id")

aynber
  • 22,380
  • 8
  • 50
  • 63
claraina
  • 63
  • 5
  • 1
    you can create a new column with `res$gene_id <- rownames(res)` then left join – jpsmith Jan 25 '23 at 09:57
  • Thanks, I think this would work - I just wanted to see if there was a way to join them without adding another column to my the original df – claraina Jan 26 '23 at 01:36

1 Answers1

0

Is this what you're looking for?

Creating two dataframes that represent the example:

library(tidyverse)

# creating the res dataframe
res = tibble(
  baseMean = c(1268.40,1413.56,3376.09),
  log2FoldChange = c(0.100013,-0.266365,-0.243707)
)

# A tibble: 3 × 2
  baseMean log2FoldChange
     <dbl>          <dbl>
1    1268.          0.100
2    1414.         -0.266
3    3376.         -0.244


# creating the anno dataframe
anno = tibble(
  gene_id = c(1,2,3),
  SYMBOL = c('Mrpl15', 'Lypla1', 'Tcea1')
)

# A tibble: 3 × 2
  gene_id SYMBOL
    <dbl> <chr> 
1       1 Mrpl15
2       2 Lypla1
3       3 Tcea1


Then you can apply this to your dataset:

# extracting the rownames and putting them in a column
res = res %>% 
  rownames_to_column('gene_id') %>% 
  mutate(gene_id = gene_id %>% as.numeric())

# A tibble: 3 × 3
  gene_id baseMean log2FoldChange
    <dbl>    <dbl>          <dbl>
1       1    1268.          0.100
2       2    1414.         -0.266
3       3    3376.         -0.244

And finally left_join them:

# left joining both datasets
res_anno = res %>% 
  left_join(.,
            anno,
            by = 'gene_id')

# A tibble: 3 × 4
  gene_id baseMean log2FoldChange SYMBOL
    <dbl>    <dbl>          <dbl> <chr> 
1       1    1268.          0.100 Mrpl15
2       2    1414.         -0.266 Lypla1
3       3    3376.         -0.244 Tcea1 

As per your comment, if you don't want to add a column to your original dataframe, you can just pipe the additional column and left_join so that it only exists in your new dataframe:

res_anno = res %>% 
  rownames_to_column('gene_id') %>% 
  mutate(gene_id = gene_id %>% as.numeric()) %>% 
  left_join(.,
            anno,
            by = 'gene_id')


# A tibble: 3 × 4
  gene_id baseMean log2FoldChange SYMBOL
    <dbl>    <dbl>          <dbl> <chr> 
1       1    1268.          0.100 Mrpl15
2       2    1414.         -0.266 Lypla1
3       3    3376.         -0.244 Tcea1 
James
  • 463
  • 4
  • 13
  • Am I still able to apply this if my res df has about 12,200 rows? (The example I put up is just for the first three genes only) – claraina Jan 26 '23 at 01:32
  • Yes, you should be able to apply this to a dataset that has a million rows. The only difference will be the code-compile time after a certain number of rows. I edited my answer so that it has more structure to it, and I added the outputs. If it's what you want, then it will work with any amount of rows in your data. – James Jan 26 '23 at 14:44
  • I also added a suggestion to pipe the new column and ```left_join``` directly into the new dataframe, leaving your original dataframe (res) unchanged. – James Jan 26 '23 at 14:47