1

I'm trying out Blaze with a mongodb backend, using the github torrent as data source.

I've setup the tunnel and can access the db via the ssh tunnel

ssh -L 27017:dutihr.st.ewi.tudelft.nl:27017 ghtorrent@dutihr.st.ewi.tudelft.nl

from datetime import datetime
from blaze import Data

users = Data("mongodb://ghtorrentro:ghtorrentro@localhost/github::users")

# I can count the number of records in this collection
# following outputs: 5901048
users.count()

# looking at users.dshape, I see a key called 'created_at: datetime,'

Now I'm trying to figure out how to query based on date.

# I tried the following
docs = users[users['created_at'] > datetime(2015, 10, 6)]

# it gives me empty list
compute(docs)

# printing the blaze query gives:
print(compute(docs, post_compute=False)).query
({'$match': {u'created_at': {'$gt': datetime.datetime(2015, 10, 6, 0, 0)}}},)

Though I know the pymongo query shows me there's a little over 5000 records. One difference is that pymongo requires datetime in isoformat, so as a string.

pymongo_count = db.users.find({"created_at": {"$gt": datetime(2015, 10, 6).isoformat(), "$lt": datetime(2015, 10, 7).isoformat()}}).count()

The following blaze query returns the correct record:

compute(users[users["login"] == "sandhujasmine"])

I also tried using isoformat() for datetime resulting in this query: ({'$match': {u'created_at': {'$gt': Timestamp('2015-10-06 00:00:00')}}},) but the same empty result of documents that matched it.

What am I doing wrong?

jasmine
  • 115
  • 1
  • 1
  • 6
  • I'm curious, what happens if you pass a string instead of a datetime? – MRocklin Oct 08 '15 at 18:24
  • I tried: users[users['created_at'] > datetime(2015, 10, 6).isoformat()] and got empty result, same as when I tried datetime object. The resulting query was ({'$match': {u'created_at': {'$gt': Timestamp('2015-10-06 00:00:00')}}},) – jasmine Oct 14 '15 at 07:27

0 Answers0