0

I've been streaming data from twitter into a mongo database. However I found that I hadn't formatted the search incorrectly, so I got data from all over the place instead of the one city I wanted (I find location by checking if the city name comes up in 'location' or 'name' under 'user' in the json).

I want to copy just the correct documents to a new collection, but I've found it nearly impossible to do in pymongo! I'm using pymongo instead of the shell because I'm using regular expressions to search for the city names(there's a lot of synonyms for it).

regex=re.compile(<\really long regular expression of city names>)

I've been able to use find() correctly with the regular expressions; it returns just what I'm looking for:

db.coll.find({'$or':[{'user.location':{'$in':[regex]}},{'user.name':{'in':[regex]}}]})

I just need to copy what it returns into a new collection, but it's proving difficult.

I tried this method, trying forEach() to try to copy the documents, using bson wrapping, which I found here, but it still won't work.

 db.coll.find({'$or':[{'user.location':{'$in':[regex]}},{'user.name':{'in' [regex]}}]})\
.forEach(bson.Code( '''

function(doc) { 
   db.subset.insert(doc);

 }'''))

Specifically, the error I get when I try this is

AttributeError: 'Cursor' object has no attribute 'forEach'

I have no idea what is wrong or how I can go about fixing this. Anyone able to tell me what I can do to fix this, or a better way to copy documents to a new collection?

Community
  • 1
  • 1

1 Answers1

1

A cursor is already able to go through the results you don't need to forEeach. Try

for tweet in db.coll.find({'$or':[{'user.location':{'$in':[regex]}},{'user.name':{'in' [regex]}}]}):
    db.subset.insert(tweet)
ThrowsException
  • 2,586
  • 20
  • 37
  • I tried that, it seems like it would work but it doesn't seem to be able to iterate through the find() query like that, I get this error: TypeError: string indices must be integers, not _sre.SRE_Pattern – Kate Bradley Aug 03 '15 at 17:58
  • Never mind, it was a grammatical error in the query! Thanks :) – Kate Bradley Aug 03 '15 at 18:26