I use scarpy
to crawl data and save it to cloud hosting mLab
successfully with MongoDB.
My collection name is recently
and data's count is 5.
I want to crawl data again and update my collection recently
, so i try to drop the collection and then insert.
Here is my code pipelines.py:
from pymongo import MongoClient
from scrapy.conf import settings
class MongoDBPipeline(object):
def __init__(self):
connection = MongoClient(
settings['MONGODB_SERVER'],
settings['MONGODB_PORT'])
db = connection[settings['MONGODB_DB']]
# here is my collection name recently setting
self.collection = db[settings['MONGODB_COLLECTION']]
def process_item(self, item, spider):
# try to drop my collection recently
self.collection.drop()
self.collection.insert(dict(item))
return item
But when I run my spider, I see my collection recently
count is 10 (It should be 5 that is what I want)
I looking for some code that how to drop collection. It's just say db.[collection Name].drop()
But its no working in my case when i try self.collection.drop()
before self.collection.insert(dict(item))
Anyone can give me some suggestions what is wrong with my code ?
That would be appreciated. Thanks in advance.