26

I use scarpy to crawl data and save it to cloud hosting mLab successfully with MongoDB.

My collection name is recently and data's count is 5. enter image description here

I want to crawl data again and update my collection recently, so i try to drop the collection and then insert.

Here is my code pipelines.py:

from pymongo import MongoClient
from scrapy.conf import settings

class MongoDBPipeline(object):

    def __init__(self):
        connection = MongoClient(
            settings['MONGODB_SERVER'],
            settings['MONGODB_PORT'])
        db = connection[settings['MONGODB_DB']]
        # here is my collection name recently setting
        self.collection = db[settings['MONGODB_COLLECTION']]

    def process_item(self, item, spider):
        # try to drop my collection recently
        self.collection.drop()
        self.collection.insert(dict(item))
        return item

But when I run my spider, I see my collection recently count is 10 (It should be 5 that is what I want) enter image description here

I looking for some code that how to drop collection. It's just say db.[collection Name].drop()

But its no working in my case when i try self.collection.drop() before self.collection.insert(dict(item))

Anyone can give me some suggestions what is wrong with my code ?

That would be appreciated. Thanks in advance.

Arpit Solanki
  • 9,567
  • 3
  • 41
  • 57
Morton
  • 5,380
  • 18
  • 63
  • 118

1 Answers1

39

You need to use drop. Suppose foo is a collection

db.foo.drop()

Or you can use drop_collection

db.drop_collection(collection_name)
Arpit Solanki
  • 9,567
  • 3
  • 41
  • 57
  • Thanks for your response but `self.collection` is db.recently in my case , so i can insert data successfully. – Morton Feb 22 '18 at 09:41
  • `insert` worked but `drop` is no working, i run my spider the data count will add 5 every time. – Morton Feb 22 '18 at 09:48
  • did you try `self.collection.drop()` or `db.drop_collection(settings['MONGODB_COLLECTION'])` @徐博俊 – Arpit Solanki Feb 22 '18 at 09:50
  • I have try `db.drop_collection(settings['MONGODB_COLLECTION'])` and `self.collection.drop()` after `self.collection = db[settings['MONGODB_COLLECTION']]` just a moment ago . It's still add 5 counts :( – Morton Feb 22 '18 at 09:57
  • In your code you are dropping the collection and then inserting some items. So I think count 5 is correct behaviour. Comment out the `self.collection.insert` line and run I am sure it will work. @徐博俊 – Arpit Solanki Feb 22 '18 at 09:59
  • I have comment out `self.collection.insert`, the strange things is its still add 5 counts, what the hell with my code... – Morton Feb 22 '18 at 10:07
  • The thing was that collection was dropping successfully and you were inserting 5 rows after dropping the collection so you are seeing 5 rows. @徐博俊 – Arpit Solanki Feb 22 '18 at 10:08