0

I used getURL and htmlTreeParse to do webscraping with the following code:

library(XML)
library(rvest)
library(httr)
library(RCurl)
url="https://www.restaurants.mcdonalds.fr/"

page = htmlTreeParse(getURL(url),useInternal = TRUE,encoding="utf8")
locs = unlist(xpathApply(page, '//div[@class="department-part"]/ul/li/a', 
   xmlGetAttr,"href"))

However, for some reasons, this code no longer works. And in getURL(url), it seems that I can get the whole source code.

url="https://www.restaurants.mcdonalds.fr/"
read_html(url) %>%
html_nodes(xpath='//div[@class="department-part"]/ul/li/a') %>%
  html_text()

I also tried rvest and it seems that read_html doesn't work either. Whereas I am able to view the source code, with Chrome for example.

I also tested another link.

url="https://restaurant.hippopotamus.fr/"
read_html(url) # works
getURL(url) # doesn't work and it did work before

How can I try to find a solution?

John Smith
  • 1,604
  • 4
  • 18
  • 45
  • I get that the website isn't available from my location (UK). Any other examples you can give? – Chris Aug 25 '18 at 12:06
  • @Chris, too bad, you can't look for a McDonald's restaurant in France then. :P Maybe `getURL("https://restaurant.hippopotamus.fr/")` ? – John Smith Aug 25 '18 at 12:17
  • 1
    And `read_html("https://restaurant.hippopotamus.fr/")` from `rvest` works fine. – John Smith Aug 25 '18 at 12:19
  • _"…As such, any reproduction, representation, use, adaptation, modification, incorporation, translation, commercialization, partial or complete, without the prior written authorization of GIE McDONALD'S FORCE, are prohibited;"_ / https://www.restaurants.mcdonalds.fr/mentions-legales?restaurantId= – hrbrmstr Nov 17 '18 at 09:58

0 Answers0