Questions tagged [rcrawler]

R package that performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application.

R package that performs parallel web crawling and web scraping. It is designed to crawl, parse and store web pages to produce data that can be directly used for analysis application.

28 questions
0
votes
1 answer

Scraping Google News with Rvest for Keywords

I want to compare News Article from different countries for the usage of a specific keyword. My idea is to scrape Google News using RCrawler: RCrawler(website =…
schneebii
  • 1
  • 1
0
votes
0 answers

Error while using ContentScraper in Rcrawler package

I am trying to extract the tables from these pages (https://spactrack.net/activespacs/ & https://warrants.tech/). I am using Rcrawler package to extract them, but it's throwing me an error when I run the below…
Adarsh KP
  • 1
  • 2
0
votes
1 answer

Website crawling: responses are different for postman and browser

I want to crawl the site https://www.ups.com/de/de/shipping/surcharges/fuel-surcharges.page. There, the company is giving all fuel surcharges they are adding to invoice amounts. I need the information to correctly calculate some costs.…
Tarek Salha
  • 307
  • 3
  • 12
0
votes
1 answer

How can I extract multiple items from 1 html using RCrawler's ExtractXpathPat?

I'm trying to get both the label and data of items of a museum collection using Rcrawler. I think I made a mistake using the ExtractXpathPat variable, but I can't figure out how to fix it. I expect an output like this: 1;"Titel(s)";"De…
Friso
  • 2,328
  • 9
  • 36
  • 72
0
votes
1 answer

Is there a way to run Rcrawler without downloading all the HTMLs?

I'm running Rcrawler on a very large website, so it takes a very long time (3+ days with default page depth). Is there a way to not download all the HTMLs to make the process faster? I only need the URLs that are stored in the INDEX. Or can anyone…
0
votes
2 answers

How to avoid 'HTTP error code:429' while web scraping?

I'm trying to web scrape a information from Google and they aren't liking it. The vector contains 2487 Google sites and from which one of them I want to get the text of the first result. I tried to create a loop to slow down the process but I'm very…
Rodf
  • 11
  • 1
0
votes
0 answers

'NULL' and 'NA' issue when scraping websites with ContentScraper in R?

I have a very long list of websites that I'd like to scrape for its title, description, and keywords. I'm using ContentScraper from Rcrawler package, and I know it's working, but there are certain URLs that it can't do and just generate the error…
cheklapkok
  • 439
  • 1
  • 5
  • 11
0
votes
2 answers

How to scrape multiple websites using Rcrawler in R?

I've noticed we don't have many questions here about Rcrawler, and I thought it's a great tool to scrape website. However, I have a problem telling it to scrape multiple websites as it can only do 3 currently. Please let me know if anyone has…
cheklapkok
  • 439
  • 1
  • 5
  • 11
0
votes
1 answer

How scrape all data by automatically click on 'Load More' by using rvest

I was using rvest to scrape a website for a couple of interested info on the webpage. An example page is like this https://www.edsurge.com/product-reviews/mr-elmer-product/educator-reviews, and I wrote a function like this: PRODUCT_NAME2 <-…
Edward Lin
  • 609
  • 1
  • 9
  • 16
0
votes
1 answer

R: How can I use the package Rcrawler to do JSON parsing in parallel?

I just came across this powerful R package but unfortunately haven't been able to find out how to parse a list of urls in parallel where the response is in JSON. As a simple example, suppose I have a list of cities (in Switzerland): list_cities <-…
Patrick Balada
  • 1,330
  • 1
  • 18
  • 37
0
votes
1 answer

R data scraping / crawling with dynamic/multiple URLs

I try to get all decrees of the Federal Supreme Court of Switzerland available at:…
captcoma
  • 1,768
  • 13
  • 29
-2
votes
2 answers

How to make my crawler (made in R) automatic?

I've been working on a RStudio to crawl some websites. I wanted to be able to run my code automatically at a particular instances during the day. I've been using Rcrawler and Rvest to crawl. The point is to do news aggregation from several sites…
Megh
  • 81
  • 5
-2
votes
1 answer

Web Crawler using R

I want to build a webcrawler using R program for website "https://www.latlong.net/convert-address-to-lat-long.html", which can visit the website with the parameter for address and then fetch the generated latitude and longitude from the site. And…
1
2