4

I am trying to scrape content from LinkedIn using R, but I keep on getting an error when trying to read the HTML content.

This is my code :

install.packages("curl")
library("selectr")
library("xml2")
library("rvest")
library("stringr")
library("curl")
url="https://www.linkedin.com/in/hollyhynes/" 
webpage <- read_html(url)

The error is:

Error in open.connection(x, "rb") : HTTP error 999

I tried using the curl package :

webpage=read_html(curl("https://www.linkedin.com/in/hollyhynes/",handle 
  = new_handle("useragent" = "Mozilla/5.0")))

but it also returned an error:

Error in open.connection(x, "rb") : HTTP error 999.
In addition: Warning messages:
1: In if (!is.character(x)) x <- structure(as.character(x), names = names(x)) :
closing unused connection 5 (https://www.linkedin.com/in/hollyhynes/)
2: In if (!is.character(x)) x <- structure(as.character(x), names = names(x)) :
closing unused connection 4 (https://www.linkedin.com/in/hollyhynes/)
3: In if (!is.character(x)) x <- structure(as.character(x), names = names(x)) :
closing unused connection 3 (https://www.linkedin.com/in/hollyhynes/)

would appreciate some help on how can I fix it thanks :)

yunzen
  • 32,854
  • 11
  • 73
  • 106
Eliza R
  • 125
  • 1
  • 10
  • Have you tried to login with your browser and copying the session cookie to your curl session – GWD Mar 26 '19 at 10:31
  • 1
    HTTP error 999 is LinkedIn telling you that it thinks you're a robot and denying your connection. LinkedIn's ToS do not permit scraping. You can try using its API instead, even though it's quite restricted. – A. Stam Mar 26 '19 at 10:39

0 Answers0