Questions tagged [pyspider]

Python based Powerful Spider(Web Crawler) System

Used for

  • Write script in python with powerful API
  • Powerful WebUI with script editor, task monitor, project manager and result viewer
  • MySQL, MongoDB, SQLite as database backend
  • Javascript pages supported!
  • Task priority, retry, periodical and recrawl by age or marks in index page (like update time)
  • Distributed architecture
38 questions
0
votes
1 answer

Why is this code only downloading one page's data?

I have tried many times, but it does not work: import requests from lxml import html, etree from selenium import webdriver import time, json #how many page do you want to scan page_numnotint = input("how many page do you want to scan") page_num =…
周义翔
  • 1
  • 2
0
votes
1 answer

Pyspider Installation for Python 3.5/win 64 "Failed building wheel for lxml

I'm trying to install pyspider and always got "Failed building wheel for lxml...", It looks like the lxml is not installed properly and I've tried to download lxml-3.6.1-cp35-cp35m-win_amd64.whl from…
Li Yin
  • 1
0
votes
1 answer

pyspider : No module named 'wsgidav'

I am using python 3.5.2 on windows 10,I installed pyspider,and run pyspider all,there are some errors,as follow: what should I do?
zwl1619
  • 4,002
  • 14
  • 54
  • 110
0
votes
1 answer

I want store output of python pyspider script to csv or json

Here My code which i made: import json from pyspider.libs.base_handler import * f = open("demo.txt","w") class Handler(BaseHandler): crawl_config = { } @every(minutes=0,seconds = 0) def on_start(self): self.crawl('Any URL',…
Piyush
  • 511
  • 4
  • 13
0
votes
0 answers

how scrapy and pyspider send requests to web server

I am learning the creeper frame: scrapy and pyspider, and I am curious about how do they send requests to web server. Does they use the python module: requests, or buit-in module urllib? Any advice is helpful. Thank you.
gaoxinge
  • 459
  • 2
  • 8
  • 21
-1
votes
1 answer

How to hide the continuous hit rates(Refresh) to a website

I have developed a Python (Requests) and Java code to scrap data from a Website. And it will work by continuously refresh the website for new data. But the Website recently identified my scraper as an Automated Service and my account had been Locked…
sam mathew
  • 35
  • 1
  • 7
-2
votes
1 answer

why xpath output keeps changing?

the problem i am facing is weird and its so much waste of time in theory this should give out the link next_page = response.xpath('//ul[@class="pagination justify-content-center"]/li[6]/a/@href').get() but the out put i got is…
user21195292
-6
votes
2 answers

Imported but unused in python

import bumpy as np import matplotlib.pyplot as per import pandas as pd. Console showing some warning. Can anyone help me with this
nag6631
  • 3
  • 1
  • 1
  • 2
1 2
3