how to filter a dataframe column based on intersected values from another column in dataframe

Question

I have two dataframe. I want to filter gene ID from expr_df dataframe based on the intersected values from another data frame gene_Annot.Basically i want to keep genes in expr_df that intersects with geneid from gene_Annot data. This is how my datasets look like:

I tried this command in R:

expr_df <- expr_df %>% select(one_of(intersect(gene_annot$gene_id, colnames(expr_df))))

But then it gives me 0 or NA values for all the ID.

It will be easier for people to help you if you can share your excerpts as code we can load. For instance, run `dput(expr_df[1:4,1:4])` and `dput(gene_annot[1:4,1:6])` and paste the code output into the body of your question. That will create code recipes to recreate exact copies of those excerpts, data formats / structures and all. — Jon Spring, Jul 18 '22 at 03:51

Dan Adams · Accepted Answer · 2022-07-21T18:07:18.143

The code you provided should work. However, the use of intersect() is superfluous. You should just remove that. Also dplyr::one_of() has been superseded and instead you should use dplyr::any_of().

library(tidyverse)

d1 <- tibble(sample_id = paste0("GTEX", 1:4), 
             ENSG1 = rnorm(4, 5, 1),
             ENSG2 = rnorm(4, 50, 3),
             ENSG3 = rnorm(4, 20, 7),
             ENSG4 = rnorm(4, 3, 0.5))

d2 <- tibble(chr = 1:3, gene_id = c("ENSG1", "ENSG3", "ENSG4"))

d1 %>% 
  select(sample_id, any_of(d2$gene_id))
#> # A tibble: 4 × 4
#>   sample_id ENSG1 ENSG3 ENSG4
#>   <chr>     <dbl> <dbl> <dbl>
#> 1 GTEX1      5.22 25.0   2.62
#> 2 GTEX2      5.99 -2.46  2.18
#> 3 GTEX3      4.56 22.0   3.29
#> 4 GTEX4      3.40 26.7   3.99

^{Created on 2022-07-18 by the reprex package (v2.0.1)}

how to filter a dataframe column based on intersected values from another column in dataframe

1 Answers1