0

Trying to drop duplicates by counting documents with same url. Using Motor (an async driver for MongoDB) for that purposes.

Here a process_item function:

according to the docs

async def process_item(self, item, spider):
        response = self.mangas.count_documents({"url": item.url})
        print(response, 'is future?', futures.isfuture(response)) # explicit type checking
        count = await response # here an exception
        if count:
            raise DropItem(f"Duplicate found of {item}")
        await self.mangas.insert_one(dict(item))
        return item

A traceback:

is future? True
2022-08-23 22:39:10 [scrapy.core.scraper] ERROR: Error processing *SOME PARSED DATA*
Traceback (most recent call last):
  File "C:\Users\Canald\AppData\Local\pypoetry\Cache\virtualenvs\api-abuLimWH-py3.10\lib\site-packages\twisted\internet\defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "C:\Users\Canald\Files\VS\HChan\hentai_scrap\pipelines.py", line 48, in process_item
    count = await response
RuntimeError: await wasn't used with future

Also tried to use maybe_deferred_to_future from scrapy.utils.defer, according to this, but it raises same exception.

Canald
  • 1
  • 1
    @Alexander For explicit type checking to show here. `self.mangas.count_documents({"url": item.url})` returns a Future object, that stored in `response` variable. Anyway `count = await self.mangas.count_documents({"url": item.url})` raises same exception. – Canald Aug 23 '22 at 18:23
  • Is it an `asyncio.Future` or a `concurrent.futures.Future`? They aren't the same thing. – dirn Aug 23 '22 at 19:05
  • @dirn `type()` says it is `` – Canald Aug 23 '22 at 19:26

0 Answers0