Questions tagged [scrape]

DO NOT USE THIS TAG. It is under an active cleanup: https://meta.stackoverflow.com/q/305314 Use [web-scraping] if your question is about scraping information from web resources (there is also [screen-scraping]) or use [pdf-scraping] if your question is about scraping information from pdf files. Use [data-extraction] if you need to extract data from other resources.

1204 questions

votes

2 answers

Web page scraping gems/tools available in Ruby

I'm trying to scrape web pages in a Ruby script that I'm working on. The purpose of the project is to show which ETFs and stock mutual funds are most compatible with the value investing philosophy. Some examples of pages I'd like to scrape…

asked Feb 23 '13 at 05:24

jhsu802701

votes

2 answers

PHP Curl following redirects

I'm trying to be a bit sneeky and as part of a learning process try and improve my page scraping skills. One thing i've come across that I have yet to be able to solve is that certain sites will use an internal link which then redirects to an…

php curl scrape

asked Apr 23 '12 at 20:53

David

34,836
11
47
77

votes

0 answers

pinyin in google translate API

I want to scrape the pinyin off of the googletranslate API instead of having to scrape from some other website (which might change its format in ten thousand ways over time and across different requests). The JSON that it returns doesn't seem to…

scrape

asked May 04 '11 at 17:13

gideonite

1,211
1
8
13

votes

2 answers

Python - save requests or BeautifulSoup object locally

I have some code that is quite long, so it takes a long time to run. I want to simply save either the requests object (in this case "name") or the BeautifulSoup object (in this case "soup") locally so that next time I can save time. Here is the…

python file beautifulsoup scrape

asked May 29 '14 at 22:04

bill999

2,147
8
51
103

votes

3 answers

Python data scraping

I want to download a couple songs off of http://www.youtube-mp3.org/. I'm using urllib2 and BeautifulSoup. The problem is that when I urllib2 open the site with my video ID plugged in, http://www.youtube-mp3.org/?c#v=lV7r8PiuecQ, I get the site but…

python youtube urllib2 scrape

asked Aug 30 '11 at 08:11

Oliver

2,182
5
24
31

votes

5 answers

How can I input data into a webpage to scrape the resulting output using Python?

I am familiar with BeautifulSoup and urllib2 to scrape data from a webpage. However, what if a parameter needs to be entered into the page before the result that I want to scrape is returned? I'm trying to obtain the geographic distance between two…

python scrape

asked Aug 13 '11 at 00:49

user728166

votes

1 answer

Python web scraping for javascript generated content

I am trying to use python3 to return the bibtex citation generated by http://www.doi2bib.org/. The url's are predictable so the script can work out the url without having to interact with the web page. I have tried using selenium, bs4, etc but cant…

javascript python web-scraping scrape

asked Feb 03 '15 at 01:07

Nick

votes

2 answers

How to download images from BeautifulSoup?

Image https://i.stack.imgur.com/S1BR2.png import requests from bs4 import BeautifulSoup r = requests.get("xxxxxxxxx") soup = BeautifulSoup(r.content) for link in links: if "http" in link.get('src'): print link.get('src') I get the…

python python-2.7 beautifulsoup scrape

asked May 11 '16 at 09:22

Fist Heart

votes

3 answers

Accessing Metacritic API and/or Scraping

Does anybody know where documentation for the Metacritic api is/if it still works. There used to be a Metacritic API at https://market.mashape.com/byroredux/metacritic-v2#get-user-details which disappeared today. Otherwise I'm trying to scrape the…

api scrape scraper

asked Jan 06 '16 at 22:12

boblikesoup

votes

5 answers

Python: the right URL to download pictures from Google Image Search

I'm trying do obtain images from Google Image search for a specific query. But the page I download is without pictures and it redirects me to Google's original one. Here's my code: AGENT_ID = "Mozilla/5.0 (X11; Linux x86_64; rv:7.0.1)…

python image scrape

asked Feb 16 '12 at 20:30

slwr

1,105
6
16
35

votes

1 answer

Scrapy 403 response because of Cloudflare (clutch.co)

I'm trying to scrape some info regarding different agencies from clutch.co. When I look up the urls in my browser everything is fine, but using scrapy it gives me 403 response. From all I read on the related issues, I suppose it's coming from…

python selenium-webdriver scrapy scrape

asked Feb 20 '23 at 10:27

Fateme Fouladkar

votes

3 answers

How to 'scrape' content from a page's source?

I have this code which gets the HTML source of a page: $page = file_get_contents('http://example.com/page.html'); $page = htmlentities($page); I want to scrape some content from it. For example, say the page's source contains this: …

php scrape

asked Sep 06 '11 at 14:23

Joey Morani

25,431
32
84
131

votes

2 answers

How to scrape tables inside a comment tag in html with R?

I am trying to scrape from http://www.basketball-reference.com/teams/CHI/2015.html using rvest. I used selectorgadget and found the tag to be #advanced for the table I want. However, I noticed it wasn't picking it up. Looking at the page source, I…

r web-scraping html-parsing scrape rvest

asked Nov 15 '16 at 17:43

David Sung

votes

2 answers

How do I scrape information off ASP.NET websites when paging and JavaScript links are being used?

I have been given a staff list which is supposed to be up to date but it doesn't match an intranet People Finder which is written in ASP.NET. As the information is sensitive I am not able to access the database the People Finder is using so the only…

c# asp.net vb.net gridview scrape

asked Mar 15 '10 at 18:01

Ian Roke

1,774
1
19
27

votes

2 answers

how to crawl a site only given domain url with scrapy

I am trying to use scrapy for crawling a website, but there's no sitemap or page indices for the website. How can I crawl all pages of the website with scrapy? I just need to download all the pages of the site without extracting any item. Do I only…

python web-crawler scrapy scrape

asked Jan 05 '13 at 23:29

David Thompson

Prev 1

…

80 81 Next