1

I have a problem reading data from a JSON file on a forum that requires login to access.

I am using rvest package and to read an html page after login this code works well. But my doubt is how I read a JSON file using the same session where the user is already registered.

library(rvest)
library(httr)

  url       <- "https://forum.com/"
  pgsession <- html_session(url)
  pgform    <- html_form(pgsession)[[1]] 
  
  filled_form <- set_values(pgform,
                            "username" = "username", 
                            "password" = "password")
  
  submit_form(pgsession,filled_form)
  

  events <- jump_to(pgsession, "https://forum.com/events.php") 
  page <- html(events)
  data_usernames <- html_text(page, trim = FALSE) 
  

Is there any method to read Json using the session? How can I make this code below work

  urlJson <- "https://forum.com/events.json"
  
  data = jsonlite::fromJSON(urlJson, simplifyDataFrame = TRUE) 
  df <- as.data.frame(data$data)
Bruno G
  • 21
  • 3
  • Every website is different. There isn't much that will work in all cases unless you using something like RSelenium to control a web browser. It's impossible to say what might work without a reproducible example. Also be sure to consult the terms of service for the website you are interacting with. Sometimes scraping password protected forums is against the terms of service. – MrFlick May 27 '21 at 05:25

1 Answers1

1

Well I figure it out. Basically using the jump_to method with session and parse_json instead fromJSON everything works as I wanted.

  jsonSession <- pgsession %>% jump_to(urlJson) 
  data <- jsonlite::parse_json(jsonSession$response, simplifyVector = TRUE)
  df <- as.data.frame(data$data)

Bruno G
  • 21
  • 3