I'm trying out Blaze with a mongodb backend, using the github torrent as data source.
I've setup the tunnel and can access the db via the ssh tunnel
ssh -L 27017:dutihr.st.ewi.tudelft.nl:27017 ghtorrent@dutihr.st.ewi.tudelft.nl
from datetime import datetime
from blaze import Data
users = Data("mongodb://ghtorrentro:ghtorrentro@localhost/github::users")
# I can count the number of records in this collection
# following outputs: 5901048
users.count()
# looking at users.dshape, I see a key called 'created_at: datetime,'
Now I'm trying to figure out how to query based on date.
# I tried the following
docs = users[users['created_at'] > datetime(2015, 10, 6)]
# it gives me empty list
compute(docs)
# printing the blaze query gives:
print(compute(docs, post_compute=False)).query
({'$match': {u'created_at': {'$gt': datetime.datetime(2015, 10, 6, 0, 0)}}},)
Though I know the pymongo query shows me there's a little over 5000 records. One difference is that pymongo requires datetime in isoformat, so as a string.
pymongo_count = db.users.find({"created_at": {"$gt": datetime(2015, 10, 6).isoformat(), "$lt": datetime(2015, 10, 7).isoformat()}}).count()
The following blaze query returns the correct record:
compute(users[users["login"] == "sandhujasmine"])
I also tried using isoformat() for datetime resulting in this query: ({'$match': {u'created_at': {'$gt': Timestamp('2015-10-06 00:00:00')}}},)
but the same empty result of documents that matched it.
What am I doing wrong?