Questions tagged [scrape]

DO NOT USE THIS TAG. It is under an active cleanup: https://meta.stackoverflow.com/q/305314 Use [web-scraping] if your question is about scraping information from web resources (there is also [screen-scraping]) or use [pdf-scraping] if your question is about scraping information from pdf files. Use [data-extraction] if you need to extract data from other resources.

1204 questions
5
votes
2 answers

Http Agility Pack - Accessing Siblings?

Using the HTML Agility Pack is great for getting descendants and whole tables etc... but how can you use it in the below situation ...Html Code above...
Location:
City, London
Jay
  • 2,715
  • 8
  • 33
  • 33
5
votes
2 answers

Using SoupStrainer to parse selectively

Im trying to parse a list of video game titles from a shopping site. however as the item list is all stored inside a tag . This section of the documentation supposedly explains how to parse only part of the document but i cant work it out. my…
Scraper
  • 181
  • 1
  • 1
  • 5
5
votes
2 answers

How to scrape dynamic webpages by Python

[What I'm trying to do] Scrape the webpage below for used car data. http://www.goo-net.com/php/search/summary.php?price_range=&pref_c=08,09,10,11,12,13,14&easysearch_flg=1 [Issue] To scrape the entire pages. In the url above, only first 30 items are…
dixhom
  • 2,419
  • 4
  • 20
  • 36
5
votes
1 answer

How to scrape multiple pages with Import.io

I am trying to scrape a a list of events from a site http://www.cityoflondon.gov.uk/events/, But when scrapping it with import.io I am able to extract just the first page. How could I extract all pages at once?
Huander
  • 71
  • 2
  • 11
5
votes
1 answer

Use rvest to scrape all p after h? (or other R package)

I am new to the world of html scraping and am having difficulty pulling in paragraphs under particular headings, using rvest in R. I want to scrape info from multiple sites that all have a relatively similar set up. They all have the same headings…
Adam
  • 1,147
  • 3
  • 15
  • 23
5
votes
1 answer

nodejs web scraper for password protected website

I am trying to scrape a website using nodejs and it works perfectly on sites that do not require any authentication. But whenever I try to scrape a site with a form that requires username and password I only get the HTML from the authentication page…
gthb7
  • 87
  • 2
  • 5
5
votes
1 answer

BeautifulSoup: How to extract data after specific html tag

I have following html and I am trying to figure out how exactly I can tell BeautifulSoup to extract td after certain html element. In this case I want to get data in after Color Digest Color Digest …
add-semi-colons
  • 18,094
  • 55
  • 145
  • 232
5
votes
1 answer

Remotely Scrape Page and Get most Relevant title or Description for Images with XPath

What I'm looking at doing is essentially the same thing a Tweet button or Facebook Share / Like button does, and that is to scrape a page and the most relevant title for a piece of data. The best example I can think of is when you're on the front…
stwhite
  • 3,156
  • 4
  • 37
  • 70
5
votes
2 answers

Python / web scrape / aspx -- is it humanly possible when there are no forms?

Total noob, obviously. Teaching self Python for web scraping in the interest of open records/government transparency/reporting/etc. There's an .aspx page I want to scrape, a week-by-week calendar for January - March 2012 But it has no forms…
greencracker
  • 121
  • 3
  • 10
4
votes
1 answer

URI Extract escaping at colons, any way to avoid this?

I have the following function below that will normally spit out a URL such as path.com/p/12345. Sometimes, when a tweet contains a colon before the tweet such as RT: Something path.com/p/123 the function will…
Zack Shapiro
  • 6,648
  • 17
  • 83
  • 151
4
votes
1 answer

C# can I Scrape a webBrowser control for links?

I'm currently learning C# and its fun so far, but I have hit a roadblock. I have a program that can scrape a webpage inside the web browser control for information. So far I can get HTML HtmlWindow window = webBrowser1.Document.Window; string str…
Gates
  • 43
  • 4
4
votes
3 answers

How to scrape iframe content using cURL

Goal: I want to scrape the word "Paris" inside an iframe using cURL. Say you have a simple page containing an iframe: Curl into this page