0

Working on a project to list all available R packages listed on the CRAN website to support our data science projects. All the listed packages are sub-sites of the main CRAN website. Is there a "simple" way to extract information of the over 1500 packages available ?

My initial research (and has worked for the main CRAN site) used. The rvest and dplyr libraries. But don't know how to do sub-pages. I'm relatively new to R and still getting my feet wet.

Site: https://cran.r-project.org/web/packages/available_packages_by_name.html

(Example [with 1500+ pages])
Sub-site: https://cran.r-project.org/web/packages/tidyverse/index.html

Extracting Elements: Package Name; Package Description; Version Number; Imports; Suggests; License; Package Source (Name); and Linking (URL)

Thank You

  • Hello Jignasu, welcome to SO. The philosphy of SO is to help you when you are stuck. So, what have you tried so far that you can show us? I am facing a similar task and I would start using a package like rvest (see [Web scraping with rvest](https://blog.rstudio.com/2014/11/24/rvest-easy-web-scraping-with-r/)). – Jan Jun 01 '20 at 19:17
  • Jan. Thank you for the follow up. I looked up the following library(rvest) library(dplyr) google <- html("https://cran.r-project.org/web/packages/available_packages_by_name.html") google %>% html_nodes() – Jignasu M. Desai Jun 01 '20 at 19:38

0 Answers0