-3

I am attempting to collect data from 75,000 articles on web of knowledge. All the data could be viewed on each article's webpage. Being an absolute beginner in programming, I am unsure how this could be done other than manually. Is there any codes I could use on R or any other platforms to extract data from the webpages directly without having to download all of the articles?

David Arenburg
  • 91,361
  • 17
  • 137
  • 196

1 Answers1

1

rvest is one really good R package for scraping general web data. It can so almost everything what python libraries Beautiful Soup or Scrapy do.

XML is another package that you can use for web scraping.

For scraping Twitter, you can use twitteR package and for Facebook Rfacebook package.

Use RTidyHTML package for correcting errors in HTML.

narendra-choudhary
  • 4,582
  • 4
  • 38
  • 58