I am attempting to collect data from 75,000 articles on web of knowledge. All the data could be viewed on each article's webpage. Being an absolute beginner in programming, I am unsure how this could be done other than manually. Is there any codes I could use on R or any other platforms to extract data from the webpages directly without having to download all of the articles?
Asked
Active
Viewed 87 times
1 Answers
1
rvest
is one really good R package for scraping general web data. It can so almost everything what python libraries Beautiful Soup
or Scrapy
do.
XML
is another package that you can use for web scraping.
For scraping Twitter, you can use twitteR
package and for Facebook Rfacebook
package.
Use RTidyHTML
package for correcting errors in HTML.

DisappointedByUnaccountableMod
- 6,656
- 4
- 18
- 22

narendra-choudhary
- 4,582
- 4
- 38
- 58