Error in open.connection(con, "rb") : HTTP error 401

Question

I am trying to extract option table for security IOC from the website https://www.nseindia.com/option-chain

The JSON link containing the data is https://www.nseindia.com/api/option-chain-equities?symbol=IOC

When i try to import data into R, i get an error,

library(jsonlite)
dat=fromJSON("https://www.nseindia.com/api/option-chain-equities?symbol=IOC")
Error in open.connection(con, "rb") : HTTP error 401.

But surprisingly when I open webiste https://www.nseindia.com/option-chain in Chrome/Firefox and select the IOC stock and then use the fromJSON it works.

Why such behavior? and How can I get the data without opening web browser?

The behavior you see is most likely an attempt to stop scraping. Looking at their [robots.txt](https://www.nseindia.com/robots.txt) you can also see, that they don't want people to scrape anything in the "/api/" path. They do have a CSV download option, maybe that works for your purposes? — Till, Sep 02 '21 at 15:46
Thanks, sure the CSV files works but can get it through `rvest` or `httr`? — Nad Pat, Sep 02 '21 at 16:05

score 0 · Answer 1 · answered Sep 02 '21 at 16:26

They seem to check your "user-agent" value and block requests from everything that doesn't look like a manually controlled browser. To circumvent this you can change your "user-agent" in a httr::GET() request.

library(httr)
library(rvest)
httr::GET("https://httpbin.org/user-agent",
          config = add_headers("user-agent" = "Mozilla/5.0"))
raw <-
  GET(
    "https://www.nseindia.com/api/option-chain-equities?symbol=IOC",
    config = add_headers("user-agent" = "Mozilla/5.0")
  ) |>
  read_html()

raw |>
  html_nodes("p") |>
  html_text() |>
  jsonlite::fromJSON()

`read_html` is not working for this link. There is no output. — Nad Pat, Oct 09 '21 at 07:59

Error in open.connection(con, "rb") : HTTP error 401

1 Answers1

Linked