2

I'm working with scrapy. I want to generate a unique user agent for each request. I have the following:

class ContactSpider(Spider):
    name = "contact"

    def getAgent(self):
        f = open('useragentstrings.txt')
        agents = f.readlines()
        return random.choice(agents).strip()

    headers = {          
        'user-agent': getAgent(),
        'content-type': "application/x-www-form-urlencoded",
        'cache-control': "no-cache"
    }

    def parse(self, response):
        open_in_browser(response)

getAgent generates an agent from a list of the form:

"Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36"

However when I run this I get:

  File "..spiders\contact_spider.py, line 35, in <module>
    class ContactSpider(Spider):
  File "..spiders\contact_spider.py", line 54, in ContactSpider
    'user-agent': getAgent(),
TypeError: getAgent() takes exactly 1 argument (0 given)
user1592380
  • 34,265
  • 92
  • 284
  • 515

1 Answers1

2

getAgent() is an instance method and expects to see the ContactSpider instance as an argument. But, the problem is, you don't need this function to be a member of your spider class - move it to a separate "helpers"/"utils"/"libs" module and import:

from helpers import getAgent

class ContactSpider(Spider):
    name = "contact"

    headers = {          
        'user-agent': getAgent(),
        'content-type': "application/x-www-form-urlencoded",
        'cache-control': "no-cache"
    }

    def parse(self, response):
        open_in_browser(response)

See also: Difference between Class and Instance methods.


Or, as an alternative approach, there is a scrapy-fake-user-agent Scrapy middleware that would rotate user agents seamlessly and randomly. User Agent strings are supplied by the fake-useragent module.

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • Thank you, that's very helpful. I read though some references on class and instance methods in python, but I'm still confused because when I define def getAgent(self) am I not passing an instance of ContactSpider to it? – user1592380 Sep 10 '16 at 01:30
  • @user61629 no problem, when you define `getAgent` as `def getAgent(self)` you expect `getAgent` to be called on an instance of your `ContactSpider` class, but you are calling it via just `getAgent()`. In any case, if you don't reference or don't need the class instance inside the method - this is an indication that you either can make a method static, or should move the function from out the class..sorry, I am terrible at explaining things clearly :) – alecxe Sep 10 '16 at 01:33
  • Its definitely true that getAgent() stands alone and does not require the class, In fact I had it as a standalone function before moving it into the class. Thanks, I will keep reading about this! – user1592380 Sep 10 '16 at 01:48
  • Would you mind looking at http://stackoverflow.com/questions/39430264/how-to-add-a-third-party-scrapy-middleware ? – user1592380 Sep 12 '16 at 17:29