Questions tagged [scraper]

Synonym of [web-scraping]

Synonym of : Let's [scrape] these tags off the bottom of our shoe

349 questions
0
votes
1 answer

Entry Widget is driving me nuts!: Tkinter Reddit Scraper thinks that a string entry is numbers?

Im in the middle of a small project, to create a tkinter gui that outputs the top ten posts from a user-defined subreddit from reddit.com using their api. Because I need to have the subreddit be the choice of the user, it needs to be inputted using…
JeffD
  • 89
  • 9
0
votes
1 answer

NameError in tkinter GUI based Reddit Scraper Application --Python

I'm in the process of building a GUI based Reddit scraper application and I have run into a few problems. First, I can't seem to get my second tkinter window to load from the redditReturn class file. Also, i'm not sure if it is correct to have my…
JeffD
  • 89
  • 9
0
votes
1 answer

Python Parse single line of XML

What im trying to do is make a scraper and there is a login page, im filling two out of three values needed to get on the next page the scraper needs a username,password and then the token, im autofilling the username and password and ive narrowed…
John Hudson
  • 429
  • 1
  • 3
  • 11
0
votes
0 answers
0
votes
1 answer

How to print only url data from Webscraper

i'm building a webscraper and want it retrieve the url from a title. This is the code i'm currently using : for item in g_data: print item.contents[1].find_all("a", {"class": "a-link-normal"})[1] And this prints:
0
votes
1 answer

Java .getInputStream() openConnection() HTTP response code ERRORS

I am trying to do the following (in Java): connect to some proxy server & http_url some But I am having some errors like : java.net.ConnectException: Connection timed out: connect... Or errors related to HTTP response code : 302, 400,…
0
votes
1 answer

python scraper cannot yield items

my spider.py is like this: class CSpider(scraper.Spider): name = 'craig' start_urls = ['http://geo.craigslist.org/iso/us/ca'] def parse(self, response): # get url_list for url in url_list: yield…
wrufesh
  • 1,379
  • 3
  • 18
  • 36
0
votes
1 answer

Using mechanize with a hidden log-in page

I want to write a scraper to pull pdfs from a database of police reports, but I've run into a snag. When I click the page's "Log In" button, it doesn't bring up a separate URL, it just loads the log-in page asynchronously. I'm not sure how it does…
Jonathan Cox
  • 341
  • 1
  • 7
  • 14
0
votes
1 answer

How to save image from remote server where image extension is missed using PHP

I use DOMDocument to extract the html page which contains the image file,html page looks like this
After I extracted src address,I put it in this function copy('http://pic.aa.com/a/b/0525',…
user7031
  • 435
  • 4
  • 16
0
votes
1 answer

how to get multiple data from xpath query?

This is HTML page (test.html)
Name: ABC
Country: USA
Date of birth: 15 Feb 1985
Feroz Ahmed
  • 931
  • 10
  • 16
0
votes
2 answers

XPath -> Selecting element with class attribute

I want to get all organic search results from Google. I need help defining the XPath to exclude the ads. The cite tag on the ads does not contain a class attribute, and the organic results have 2 different class values. My attempts at defining the…
Jesse
  • 1
  • 2
0
votes
0 answers

Feed contents of a text file as an input for a textbox while the program is running

I have an input file saved in a specified folder and used a stream reader to read the content one line at a time. So my program reads the first line and adds the content in the textbox. I wanted to get the first line of the text file, and use it as…
Try_Learning
  • 31
  • 1
  • 10
0
votes
1 answer

Scraping from VBA (very close to working!!)

I have been playing around with a scraper for consumer stocks and I can scrape data from the main page of items but once i start using the second, thid parges, Sub asosdesc2() Const READYSTATE_COMPLETE = 4 Dim j As Integer Dim ie As…
0
votes
1 answer

JSOUP Scraping JavaScript piece Java

I am using Jsoup to scrap some data. In my document, I have something like: