0

I'm trying to scrape a picture using rvest, with this code:

url <- "https://fr.wikipedia.org/wiki/Robert_Jardillier"
webpage <- html_session(url)
link.titles <- webpage %>% html_nodes(".noarchive .image img")

img.url <- link.titles %>% html_attr("src")

download.file(img.url, "test.png", mode = "wb")

But when trying to download this, I have the following message :

trying URL '//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Robert_Jardillier_1932.jpg/220px-Robert_Jardillier_1932.jpg'
Error in download.file(img.url, "test.png", mode = "wb") : 
  cannot open URL '//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Robert_Jardillier_1932.jpg/220px-Robert_Jardillier_1932.jpg'
In addition: Warning message:
In download.file(img.url, "test.png", mode = "wb") :
  URL '//upload.wikimedia.org/wikipedia/commons/thumb/3/38/Robert_Jardillier_1932.jpg/220px-Robert_Jardillier_1932.jpg': status was 'URL using bad/illegal format or missing URL'
TylerH
  • 20,799
  • 66
  • 75
  • 101
boredgirl
  • 49
  • 7
  • From the `download.file` help: "The url must start with a scheme such as ‘⁠http://⁠’, ‘⁠https://⁠’, ‘⁠ftp://⁠’ or ‘⁠file://⁠’. Which methods support which schemes varies by R version, but method = "auto" will try to find a method which supports the scheme." – Ric Oct 18 '22 at 19:45

2 Answers2

0

Try:

download.file(paste0("http:",img.url), "test.png", mode = "wb")
Ric
  • 5,362
  • 1
  • 10
  • 23
0

This worked with me.

suppressPackageStartupMessages({
  library(rvest)
  library(dplyr)
})

url <- "https://fr.wikipedia.org/wiki/Robert_Jardillier"
page <- read_html(url)

page %>%
  html_elements("a") %>%
  html_attr("href") %>%
  grep("Robert_Jardillier.*\\.jpg", ., value = TRUE) %>%
  unique() %>%
  basename() %>%
  paste0(url, "#/media/", .) %>%
  download.file(destfile = "test.jpg")
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66