timeout error when scraping web with GET() function in R

Question

I was trying to download a file from https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download in R with download.file() function. It turns out that the request was rejected by the host. Then I tried

httr::GET(url=url, add_headers("User-Agent"="Mozilla/5.0"))

to fake headers, but it was still not working. It was weird as the same trick was applied in Python and passed the test with a status code of 200.

How can I resolve this? Thank you.

score 0 · Answer 1 · answered Sep 24 '20 at 15:13

0

Welcome to Stack Overflow Yu Bai.

Instead of using download.file(), you can simply insert the file url as an argument of read.csv(), as follows:

file_url = 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download'
df = read.csv(file_url, header = FALSE)

If for any reason you want to download the file, you can do:

file_url = 'https://old.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download'
file_path = '~/Downloads/companylist.csv'
download.file(file_url, file_path)
read.csv(file_path, header = FALSE)

Let us know if your problem was solved.

answered Sep 24 '20 at 15:13

rodolfoksveiga

1,181
4
17

Thank you for your answer, but I think the problem was that the request was rejected by the host which uses a firewall to anti-scrap. Because I used GET() function in httr, and it could not get the content either. – yu bai Sep 24 '20 at 15:18
could you download and access the file the way i proposed? – rodolfoksveiga Sep 24 '20 at 17:13
No luck, unfortunately. I fixed it with a faked headers, thank you anyway. – yu bai Sep 24 '20 at 19:24

timeout error when scraping web with GET() function in R

1 Answers1