2

Trying to pull some data from a REST API however unable to correctly pass as date parameter into the string. Using sprintf I was successful in passing in the search term and website however no luck with the discoverDate.

https://newsriver.io is the API in question

Function to grab data by one search term and one website

get_newsriver_content <- function(searcht,website,api_key){
url <- sprintf('https://api.newsriver.io/v2/search?query=text%%3A%s%%20OR%%20website.domainName%%3A%s%%20OR%%20language%%3AEN&sortBy=_score&sortOrder=DESC&limit=100',searcht, website)
news_get<- GET(url, add_headers(Authorization = paste(api_key, sep = "")))
news_txt <- content(news_get, as = "text", encoding = "UTF-8") 
news_df <- fromJSON(news_txt)
news_df$discoverDate <- as.Date(news_df$discoverDate)
news_df
}

Question Updated - I would also like to make multiple API calls based on a vector of dates.

mannym
  • 325
  • 1
  • 16
  • 1
    I'm not sure what is the problem here. If you look at the [query builder](https://console.newsriver.io/river/0), you can query by text, title, website name and language, not discoverDate (only available to sort the results). If I may suggest something else, talke a look at `urltools` package, in particular the `param_set` function. You can build your query in a much cleaner way: `url_base %>% param_set("query", "...") %>% param_set("sortBy", "_score") %>% param_set("sortOrder", "DESC") %>% param_set("limit", "100")` – quartin Oct 06 '17 at 17:07
  • @quartin great advice I have also got the api creator to help me out the URL encode. I will post the answer shortly – mannym Oct 06 '17 at 21:45
  • @quartin updated – mannym Oct 20 '17 at 18:15

1 Answers1

2

Here is how I figured out my problem

It was really a 2 Step problem

  1. Figuring out how to properly encode my query to be inserted into in the Curl Call
  2. Creating a function that made an API call based on a vector of dates and appended it to a data frame.

Here is how I did it.

library(tidyverse)
library(jsonlite)
library(urltools)
library(httr)

# Function For Pulling by Date  
get_newsriver_bydate <- function(query, date_v){

#Being Kind to the free API - Shout out to Elia at Newsriver who has been ever patient
pb$tick()$print()
Sys.sleep(sample(seq(0.5, 2.5, 0.5), 1))

#This is where is used the URL encode package as suggested by quartin
url_base <- "https://api.newsriver.io/v2/search"
create_curl_call <- url_base %>% 
param_set("query",url_encode(query)) %>% 
param_set("sortBy", "_score") %>% 
param_set("sortOrder", "DESC") %>% 
param_set("limit", "100") 

#I had most of this before however I changed my output to a tibble
#more versatile to work with 

get_curl <- GET(create_curl_call, add_headers(Authorization = paste(api_key, sep = "")))
curl_to_json <- content(get_curl, as = "text", encoding = "UTF-8")
news_df <- fromJSON(curl_to_json, flatten = TRUE)
news_df$discoverDate <- as.Date(news_df$discoverDate)
as.tibble(news_df)
}

# Set Configration and Set API key
set_config(config(ssl_verifypeer = 0L))
api_key <- "mykey"

#Set my vector of Dates
dates1 <- seq(as.Date("2017-09-01"), as.Date("2017-10-01"), by = "days")

#Set up my progress bar
pb <- progress_estimated(length(dates1))

#Sprintf my query into a vector of queries based on date
query <- sprintf('text:"Canada" AND text:"Rocks" AND language:EN AND discoverDate:[%s TO %s]',dates1, dates1)

 #Run the query and be patient
news_df <- map_df(query, get_newsriver_bydate, .id = "query")

So for my research method and how I came to solving these 2 problems

  1. Quartin gave me a suggestion to look up urltools package https://cran.rstudio.com/web/packages/urltools/index.html - This package helps you encode and decode your URL and various other functions that are fast and vectorised. Next my issue was getting my query correct here I simply looked up the API documentation which I suggest anyone trying to pull from an API do. May sound like a no brainer but I didn't give it a full read before posting my question

  2. Creating the function I used a number of previous answers to help build it however the below post helped the most

API Query for loop This post helped me with the progress bar and the map function to get everything into one Data frame.

There may very well be a better answer but this works for me so far.

mannym
  • 325
  • 1
  • 16