0

I need to get an array with the values from the field 'colname'. I can't return a Cursor, just the array of values.

Is there a way to query this array without having to loop the Cursor? I feel this is a waste of processing resources.

Right now I'm doing this:

from pymongo import MongoClient

client = MongoClient('mongodb://localhost:27017/')
headers = client['headers']

entomo = headers.entomo

entomo_data = entomo.find()
entomo_array = []
for data in entomo_data:
    entomo_array.append(data['colname'])

Then I return the entomo_array.

styvane
  • 59,869
  • 19
  • 150
  • 156
AFP_555
  • 2,392
  • 4
  • 25
  • 45

2 Answers2

1

You can do this with the .aggregate() method by $grouping your documents by None

cursor = entomo.aggregate([
    {'$group': {
        '_id': None, 
        'data': {'$push': '$colname'}
    }}
])

From there, you simply consume the cursor using next.

entomo_array = next(cursor)['data']

But if 'colname' is unique within the collection, you can simply use the the distinct method.

entomo_array = entomo.distinct('colname')
styvane
  • 59,869
  • 19
  • 150
  • 156
0

If the 'colname' field has distinct values or if you do not care about duplicate values you can use distinct function. For your example:

entomo_array = entomo.find().distinct('colname')
Moi Syme
  • 466
  • 2
  • 8
  • I'm not sure but I think this would consume more resources than just looping the array. Wouldn't it? But I guess the distinct's implementation has a O(n) complexity as it must use a Hashmap, just like looping. – AFP_555 Mar 31 '17 at 01:14
  • About the execution time, I think that when you run distinct('colname') the command runs on mongodb server and then the result returns via one I/O but when you run a loop on cursor, generally I believe, the code will make more than one I/O. Finally, you can use the distinct command on collection level if you have not any filter. http://api.mongodb.com/python/current/api/pymongo/collection.html#pymongo.collection.Collection.distinct – Moi Syme Mar 31 '17 at 09:00