DO NOT USE THIS TAG. It is under an active cleanup: https://meta.stackoverflow.com/q/305314 Use [web-scraping] if your question is about scraping information from web resources (there is also [screen-scraping]) or use [pdf-scraping] if your question is about scraping information from pdf files. Use [data-extraction] if you need to extract data from other resources.
Questions tagged [scrape]
1204 questions
5
votes
2 answers
Http Agility Pack - Accessing Siblings?
Using the HTML Agility Pack is great for getting descendants and whole tables etc... but how can you use it in the below situation
...Html Code above...
- Location:
- City, London
-
Jay
- 2,715
- 8
- 33
- 33
5
votes
2 answers
Using SoupStrainer to parse selectively
Im trying to parse a list of video game titles from a shopping site. however as the item list is all stored inside a tag .
This section of the documentation supposedly explains how to parse only part of the document but i cant work it out. my…

Scraper
- 181
- 1
- 1
- 5
5
votes
2 answers
How to scrape dynamic webpages by Python
[What I'm trying to do]
Scrape the webpage below for used car data.
http://www.goo-net.com/php/search/summary.php?price_range=&pref_c=08,09,10,11,12,13,14&easysearch_flg=1
[Issue]
To scrape the entire pages. In the url above, only first 30 items are…

dixhom
- 2,419
- 4
- 20
- 36
5
votes
1 answer
How to scrape multiple pages with Import.io
I am trying to scrape a a list of events from a site http://www.cityoflondon.gov.uk/events/, But when scrapping it with import.io I am able to extract just the first page.
How could I extract all pages at once?

Huander
- 71
- 2
- 11
5
votes
1 answer
Use rvest to scrape all p after h? (or other R package)
I am new to the world of html scraping and am having difficulty pulling in paragraphs under particular headings, using rvest in R.
I want to scrape info from multiple sites that all have a relatively similar set up. They all have the same headings…

Adam
- 1,147
- 3
- 15
- 23
5
votes
1 answer
nodejs web scraper for password protected website
I am trying to scrape a website using nodejs and it works perfectly on sites that do not require any authentication. But whenever I try to scrape a site with a form that requires username and password I only get the HTML from the authentication page…

gthb7
- 87
- 2
- 5
5
votes
1 answer
BeautifulSoup: How to extract data after specific html tag
I have following html and I am trying to figure out how exactly I can tell BeautifulSoup to extract td after certain html element. In this case I want to get data in after Color Digest
Color Digest
…

add-semi-colons
- 18,094
- 55
- 145
- 232
5
votes
1 answer
Remotely Scrape Page and Get most Relevant title or Description for Images with XPath
What I'm looking at doing is essentially the same thing a Tweet button or Facebook Share / Like button does, and that is to scrape a page and the most relevant title for a piece of data. The best example I can think of is when you're on the front…

stwhite
- 3,156
- 4
- 37
- 70
5
votes
2 answers
Python / web scrape / aspx -- is it humanly possible when there are no forms?
Total noob, obviously. Teaching self Python for web scraping in the interest of open records/government transparency/reporting/etc.
There's an .aspx page I want to scrape, a week-by-week calendar for January - March 2012
But it has no forms…

greencracker
- 121
- 3
- 10
4
votes
1 answer
URI Extract escaping at colons, any way to avoid this?
I have the following function below that will normally spit out a URL such as path.com/p/12345.
Sometimes, when a tweet contains a colon before the tweet such as
RT: Something path.com/p/123
the function will…

Zack Shapiro
- 6,648
- 17
- 83
- 151
4
votes
1 answer
C# can I Scrape a webBrowser control for links?
I'm currently learning C# and its fun so far, but I have hit a roadblock.
I have a program that can scrape a webpage inside the web browser control for information.
So far I can get HTML
HtmlWindow window = webBrowser1.Document.Window;
string str…

Gates
- 43
- 4
4
votes
3 answers
How to scrape iframe content using cURL
Goal: I want to scrape the word "Paris" inside an iframe using cURL.
Say you have a simple page containing an iframe:
Curl into this page
4
votes
3 answers
How to screen scrape an Ajax site in Java?
I wish to screen scrape several Ajax based websites and simulate clicks which refresh part of the webpage, and then read the updated HTML. Is there any Java library which can do this?

yazz.com
- 57,320
- 66
- 234
- 385
4
votes
3 answers
How can I scrape data from a text table using Python?
I have the following text and I would like to scrape the data items and save them in excel. Is there a way to do this in Python?
text = """
ANNUAL COMPENSATION LONG-TERM COMPENSATION
…

user728166
- 247
- 1
- 3
- 10
4
votes
2 answers
Single Scrapy Project vs. Multiple Projects
I have this dilemma on how to store all of my spiders. These spiders will be used by fed into Apache NiFi using a command line invocation and items read from stdin. I also plan to have a subset of these spiders return single item results using…

Lijo
- 43
- 3