Online Data Collection

Question

I am attempting to collect data from 75,000 articles on web of knowledge. All the data could be viewed on each article's webpage. Being an absolute beginner in programming, I am unsure how this could be done other than manually. Is there any codes I could use on R or any other platforms to extract data from the webpages directly without having to download all of the articles?

score 1 · Answer 1 · edited Feb 11 '21 at 13:15

1

rvest is one really good R package for scraping general web data. It can so almost everything what python libraries Beautiful Soup or Scrapy do.

XML is another package that you can use for web scraping.

For scraping Twitter, you can use twitteR package and for Facebook Rfacebook package.

Use RTidyHTML package for correcting errors in HTML.

edited Feb 11 '21 at 13:15

DisappointedByUnaccountableMod

6,656
4
18
22

answered Nov 18 '15 at 17:44

narendra-choudhary

4,582
4
38
58

Online Data Collection

1 Answers1