-4

I am trying to get start_time from the scrapy stats.

At scrapy doc they say something like that.

https://docs.scrapy.org/en/latest/topics/stats.html

Okay, so, as they do, I catch the stats at init but I get an error like I am not passing the stats argument. I don't want it to be like an argument. Here is my code.

pipelines.py

class MongoDBPipeline(object):

    def __init__(self, stats):
        self.timeStarted = stats.get_value('start_time')

    def process_item(self, item, spider):
        valid = True
        for data in item:
            if not data:
                valid = False
                raise DropItem("Missing {0}!".format(data))
            if valid:
                item['createdAt'] = self.timeStarted

                self.collection.insert(dict(item))
                logging.info("Video cargado.")
            return item

The error I get is this exactly:

TypeError: __init__() missing 1 required positional argument: 'stats'

Idk what to do. Thanks!

wwii
  • 23,232
  • 7
  • 37
  • 77
  • 2
    Please include a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). And can you please post the complete error you are receiving? – LeoE Jan 17 '20 at 14:56
  • 1
    Can you show a minimal example of the complete class definition and how you are using it? Are you making an instance first? – wwii Jan 17 '20 at 14:58
  • Hi, I added the error I got. The pipeline is working perfectly but I cannot get the stats. @LeoE These two defs are the only ones I have in this pipeline. What more should I add? – Jose E. Saura Jan 17 '20 at 15:02
  • Show the part of the code where you create the instance of the `class MongoDBPipeline` – abhilb Jan 17 '20 at 15:09
  • It is in pipelines.py, is that what you need to know? @abhilb – Jose E. Saura Jan 17 '20 at 15:17
  • Again: Please include a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). We cannot confirm the error or try anything out. Where does `stats`come from? What exactly is in the error message? Is there nothing else but this one single line? I doubt it. Will downvote until an minimal reproducible example is included. – LeoE Jan 17 '20 at 15:18
  • yes. because now that you added an extra parameter to the __init__ function. when you instantiate the class you need to pass an argument. – abhilb Jan 17 '20 at 15:18
  • 3
    The error you get isn't really helpful without seeing the code that *produced* the error as well. – chepner Jan 17 '20 at 15:19
  • Okay, in the scrapy documentation they just add stats in __init__, thats why I am asking. This is everything I know about the code for real. @LeoE The stats doesn' t come from nowhere, scrapy documentation use it like that. – Jose E. Saura Jan 17 '20 at 15:33
  • 1
    Something is using your class somewhere in your code. That is where it is being used incorrectly. When posting a question about code that produces an Exception, always include the **complete** Traceback - copy and paste it then format it as code (select it and type `ctrl-k`). the Traceback will show you (and us) where that error is occurring making it easier to *trace* the problem. Maybe [catch the error](https://docs.python.org/3/tutorial/errors.html#handling-exceptions) and inspect/print relevant data in the except suite – wwii Jan 17 '20 at 15:43
  • @JEdward Somewhere in some other file maybe you have something like `xyz = MongoDBPipeline()` **this** is the code we need. – LeoE Jan 17 '20 at 16:25
  • Hi, sorry @LeoE I dont do what you do because scrapy do that alone. It is a library that works this way. Every item you scrap goes through the pipelines. Thanks for your time – Jose E. Saura Jan 20 '20 at 11:39

1 Answers1

1

You forgot

@classmethod
def from_crawler(cls, crawler):
    return cls(crawler.stats)

which runs __init__ with argument crawler.stats

See example in your link Common Stats Collector uses and Write items to MongoDB.
Both have class method from_crawler().

scrapy creates pipeline using

MongoDBPipeline.from_crawler(crawler)

and original from_crawler() runs __init__(self) without arguments - so your new __init__(self, stats) can't get stats and it shows error. But if you add own from_crawler() which runs __init__(self, stats) with crawler.stats then __init__(self, stats) will get it.


EDIT: Minimal example which shows it.

It works correctly but if you remove from_crawler() then it gives your error.

You can copy all code into one file and run as python script.py without using scrapy from_crawler and scrapy getspider to create project.

import scrapy

class MySpider(scrapy.Spider):

    name = 'myspider'

    start_urls = ['http://books.toscrape.com/'] #'http://quotes.toscrape.com']

    def parse(self, response):
        print('url:', response.url)


class MyPipeline(object):

    def __init__(self, stats):
        print('__init__ stats:', stats)
        self.stats = stats

    @classmethod
    def from_crawler(cls, crawler):
        print('from_crawler stats:', crawler.stats)
        return cls(crawler.stats)

# ---

from scrapy.crawler import CrawlerProcess

c = CrawlerProcess({
    'ITEM_PIPELINES': {'__main__.MyPipeline': 1}, # used Pipeline created in current file (needs __main___)
})
c.crawl(MySpider)
c.start()
furas
  • 134,197
  • 12
  • 106
  • 148
  • Thanks @furas, I will try this evening. Let me ask you something. Was my question THAT poorly written or explained? I mean, they were asking where I used the method, I didn't call it explicitly because it is scrapy... Idk man, It feels bad. – Jose E. Saura Jan 20 '20 at 11:41
  • not all people may know `scrapy` but sometimes we can resolve problem even if we did use some module before. Full error message (starting at word `Traceback`) could gives more information in which place is error. Example code could help to run it and test different ideas. To resolve problem I had to first create example code which I could run and see full error and test ideas. – furas Jan 20 '20 at 12:11
  • This worked properly. I didn't use the `from_crawler` because I thought I didn't need that, but ofc it was because I don't fully understand the flow in `scrapy`. That was really didactic, thanks again @furas . – Jose E. Saura Jan 20 '20 at 14:27