0

I am trying to extract option table for security IOC from the website https://www.nseindia.com/option-chain

The JSON link containing the data is https://www.nseindia.com/api/option-chain-equities?symbol=IOC

When i try to import data into R, i get an error,

library(jsonlite)
dat=fromJSON("https://www.nseindia.com/api/option-chain-equities?symbol=IOC")
Error in open.connection(con, "rb") : HTTP error 401.

But surprisingly when I open webiste https://www.nseindia.com/option-chain in Chrome/Firefox and select the IOC stock and then use the fromJSON it works.

Why such behavior? and How can I get the data without opening web browser?

Nad Pat
  • 3,129
  • 3
  • 10
  • 20
  • The behavior you see is most likely an attempt to stop scraping. Looking at their [robots.txt](https://www.nseindia.com/robots.txt) you can also see, that they don't want people to scrape anything in the "/api/" path. They do have a CSV download option, maybe that works for your purposes? – Till Sep 02 '21 at 15:46
  • Thanks, sure the CSV files works but can get it through `rvest` or `httr`? – Nad Pat Sep 02 '21 at 16:05
  • The download link for the CSV is "protected" by Javascript. – Till Sep 02 '21 at 16:27

1 Answers1

0

They seem to check your "user-agent" value and block requests from everything that doesn't look like a manually controlled browser. To circumvent this you can change your "user-agent" in a httr::GET() request.

library(httr)
library(rvest)
httr::GET("https://httpbin.org/user-agent",
          config = add_headers("user-agent" = "Mozilla/5.0"))
raw <-
  GET(
    "https://www.nseindia.com/api/option-chain-equities?symbol=IOC",
    config = add_headers("user-agent" = "Mozilla/5.0")
  ) |>
  read_html()

raw |>
  html_nodes("p") |>
  html_text() |>
  jsonlite::fromJSON()
Till
  • 3,845
  • 1
  • 11
  • 18