Extracting data from (sub-) websites using R or RStudio

Asked Jun 01 '20 at 19:10

Active Jun 01 '20 at 19:54

Viewed 117 times

Working on a project to list all available R packages listed on the CRAN website to support our data science projects. All the listed packages are sub-sites of the main CRAN website. Is there a "simple" way to extract information of the over 1500 packages available ?

My initial research (and has worked for the main CRAN site) used. The rvest and dplyr libraries. But don't know how to do sub-pages. I'm relatively new to R and still getting my feet wet.

Site: https://cran.r-project.org/web/packages/available_packages_by_name.html

(Example [with 1500+ pages])
Sub-site: https://cran.r-project.org/web/packages/tidyverse/index.html

Extracting Elements: Package Name; Package Description; Version Number; Imports; Suggests; License; Package Source (Name); and Linking (URL)

Thank You

edited Jun 01 '20 at 19:54

asked Jun 01 '20 at 19:10

Jignasu M. Desai

Hello Jignasu, welcome to SO. The philosphy of SO is to help you when you are stuck. So, what have you tried so far that you can show us? I am facing a similar task and I would start using a package like rvest (see [Web scraping with rvest](https://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/)). – Jan Jun 01 '20 at 19:17
Jan. Thank you for the follow up. I looked up the following library(rvest) library(dplyr) google <- html("https://cran.r-project.org/web/packages/available_packages_by_name.html") google %>% html_nodes() – Jignasu M. Desai Jun 01 '20 at 19:38

Extracting data from (sub-) websites using R or RStudio

0 Answers0