1

I am trying to filter a data frame based on user input via a JSON file parsed into the script.

Given the following inputs:

#tibble to be filtered:

> database
# A tibble: 11 x 6
   strain genotype floodmedia inductionDelay treatment timePoint
   <lgl>  <chr>    <chr>               <dbl> <chr>     <chr>    
 1 NA     CBF1     SCGal                  15 EtoH      -1       
 2 NA     CBF1     SCGal                  15 Estradiol -1       
 3 NA     CBF1     SCGal                  15 EtoH      15       
 4 NA     CBF1     SCGal                  15 Estradiol 15       
 5 NA     CBF1     SCGal                  15 EtoH      90       
 6 NA     CBF1     SCGal                  15 Estradiol 90       
 7 NA     CBF1     SCGal                  15 EtoH      -1       
 8 NA     CBF1     SCGal                  15 Estradiol -1       
 9 NA     CBF1     SCGal                  15 EtoH      15       
10 NA     CBF1     SCGal                  15 Estradiol 15       
11 NA     CBF1     SCGal                  15 EtoH      90   

#JSON Input:

{
    "timePoint":["15", "-1"],
    "treatment":["EtoH"]
}

I would like to return:

# A tibble: 4 x 6
  strain genotype floodmedia inductionDelay treatment timePoint
  <lgl>  <chr>    <chr>               <dbl> <chr>     <chr>    
1 NA     CBF1     SCGal                  15 EtoH      -1       
2 NA     CBF1     SCGal                  15 EtoH      15       
3 NA     CBF1     SCGal                  15 EtoH      -1       
4 NA     CBF1     SCGal                  15 EtoH      15     

I need the solution to be general in terms of the variable names so that given a tibble with different variable names and a JSON matching those new names, the code would still properly filter the new tibble. As such, I have tried to use a for loop to iterate through the parsed json object, but have been unsuccessful thus far.

#code I have tried

json <- read_json("filepath_to_json_file_shown_above")
query <- database
for(i in names(json)){
  query <- query %>% filter(i %in% json$i)
}

I am using the packages tidyverse/dplyr and jsonlite to do the above tasks, though I do not mind using a different package if that is more useful.

Thank you for any help and I apologize if I have made mistakes in my question. This is my first time using StackOverflow. Please let me know if anything is unclear or incorrect.

sbhavsar
  • 13
  • 4

1 Answers1

0

I cannot say I am confident in this, but it seems like it might work. I recently came across another post that described a function for filtering on multiple conditions with tidyverse. Perhaps you could adapt something like this below. Of note, this ignores variable types (numeric vs. character).

library(jsonlite)
library(tidyverse)

json_df <- fromJSON(json_file)

df <- unnest(json_df, cols = everything())

my_filter <- function(df, cols, conds){     
  fp <- map2(cols, conds, function(x, y) quo((!!(as.name(x))) %in% !!y))
  filter(df, !!!fp)
}

my_filter(database, cols = as.list(names(df)), conds = as.list(df))

Output

  strain genotype floodmedia inductionDelay treatment timePoint
1     NA     CBF1      SCGal             15      EtoH        -1
2     NA     CBF1      SCGal             15      EtoH        15
3     NA     CBF1      SCGal             15      EtoH        -1
4     NA     CBF1      SCGal             15      EtoH        15

Data

database <- read.table(text = 
"strain genotype floodmedia inductionDelay treatment timePoint
 NA     CBF1     SCGal                  15 EtoH      -1       
 NA     CBF1     SCGal                  15 Estradiol -1       
 NA     CBF1     SCGal                  15 EtoH      15       
 NA     CBF1     SCGal                  15 Estradiol 15       
 NA     CBF1     SCGal                  15 EtoH      90       
 NA     CBF1     SCGal                  15 Estradiol 90       
 NA     CBF1     SCGal                  15 EtoH      -1       
 NA     CBF1     SCGal                  15 Estradiol -1       
 NA     CBF1     SCGal                  15 EtoH      15       
 NA     CBF1     SCGal                  15 Estradiol 15       
 NA     CBF1     SCGal                  15 EtoH      90", header = T)

json_file <- 
'[{
  "timePoint":["15", "-1"],
  "treatment":["EtoH"]
}]'
Ben
  • 28,684
  • 5
  • 23
  • 45
  • Thanks for your help, Ben. When I tried your code, the line `df <- unnest(json_df, cols = everything())` gave me the following error: **Error in UseMethod("unnest") : no applicable method for 'unnest' applied to an object of class "list"**. Basically, it is reading the JSON file in as a list instead of a data frame, and unnest can only take in data frames as arguments is what I concluded. I looked around for an answer but wasn't successful. Do you have thoughts? – sbhavsar Dec 06 '19 at 16:57
  • Nevermind, I figured it out and your method seems to work. Thanks for your help! – sbhavsar Dec 06 '19 at 17:14