1

I'm using Blaze (0.6.3) with Anaconda 2.1.0 (on Python 2.7.8). I'm trying to use filters based on dates on Table's rows.

The mock TSV file is the following:

name    amount  date
foo 100 2001-05-11 08:54:48.063856
bar 1000    0001-01-01 00:00:00.0
baz 10000   1970-01-02 00:00:00.0

The python code is

from blaze import *
from datetime import datetime
data = Table(CSV('mock.tsv'))

data[data.name > 'bar']
data[data.amount > 1000]
data[data.date > datetime(1970,1,1)]

The first two filters are ok, but the third one throws a SyntaxError.

It all seems to boil down to the following:

lambda (name, amount, date): date > (1970-01-01 00:00:00)

which is syntactically invalid. Somehow, somewhere, datetime(1970,1,1) was translated to datetime(1970-01-01 00:00:00), then the datetime was forgotten. Blaze itself recognizes the date column with ?datetime type, which is what I want, but then it fails in the comparison.

Am I using it the wrong way?

pazqo
  • 477
  • 3
  • 9

2 Answers2

1

This was an older bug that has since been fixed. Here it is working with development version. I believe that the latest stable release on Anaconda (0.6.5) should work fine as well

In [1]: !cat tmp/myfile.csv
name, amount, date
foo, 100, 2001-05-11 08:54:48.063856
bar, 1000, 0001-01-01 00:00:00.0
baz, 10000, 1970-01-02 00:00:00.0

In [2]: from blaze import *

In [3]: data = Table('tmp/myfile.csv')

In [4]: from datetime import datetime

In [5]: data[data.date > datetime(1970,1,1)]
Out[5]: 
  name  amount                       date
0  foo     100 2001-05-11 08:54:48.063856
1  baz   10000        1970-01-02 00:00:00

The following should solve your problem

conda update blaze

Also, Blaze is happy to coerce your strings to the appropriate type, just in case you were too lazy to create the datetime yourself

In [6]: data[data.date > '1970-01-01']
Out[6]: 
  name  amount                       date
0  foo     100 2001-05-11 08:54:48.063856
1  baz   10000        1970-01-02 00:00:00
MRocklin
  • 55,641
  • 23
  • 163
  • 235
  • Thank you @MRocklin! Still not working in 0.6.5 (I get the very same error), but I'll look forward for the next version. Thank you again for the great work! – pazqo Nov 11 '14 at 09:44
0

You may just use pandas.to_datetime to compare both datetime string, something like this works:

import pandas as pd

data = pd.read_clipboard()

data
  name  amount                        date
0  foo     100  2001-05-11 08:54:48.063856
1  bar    1000                  1968-01-01
2  baz   10000       1970-01-02 00:00:00.0

the problem lies on the invalid Year value of 0001-01-01 00:00:00.0 as it translates to pandas as 2001-01-01 ...

pd.to_datetime(data['date'][1])
Timestamp('2001-01-01 00:00:00')

By changing back the date value on the invalid one,

# for example as 1968-01-01
data['date'][1] = '1968-01-01'

It successfully returns your desired results

data[pd.to_datetime(data.date) > pd.to_datetime('1970-01-01')]
  name  amount                        date
0  foo     100  2001-05-11 08:54:48.063856
2  baz   10000       1970-01-02 00:00:00.0
Anzel
  • 19,825
  • 5
  • 51
  • 52
  • Thank you for your answer! Actually, I am working with a large dataset and I wanted to avoid pandas dataframes. Blaze is lazy, which is a great feature. I can also use an auxiliary column with the year, then filter on that column, but still I would like to be able to manipulate dates as they come, with no workaround. By the way, in pandas I can declare my own null types (Nat), so I can filter out those dates on the fly, but still I want that operation to be lazy and "blaze-like". By the way, 0001-01-01 is parsed correctly by Blaze's Table (year is 1); I guess there should be an idiomatic way – pazqo Nov 10 '14 at 15:25
  • 1
    @pazqo, I haven't worked with **Blaze** before so I cannot comment further. But from what you said, it should work as long as you have the `date.date` in the same type as your comparison, ie. `datetime` object. A wild guess would be your current `date.date` is in different but similar type -- like date object, or string or a special type in **Blaze**. If you convert one of them to be the same type, I'm sure it will work out fine – Anzel Nov 10 '14 at 15:35
  • I tried that. It looks like they all are the same type anyway: `pd.to_datetime('19700101000000') == datetime.datetime(1970,1,1)` is True and so is `ds.coretypes.datetime.datetime(1970,1,1) == datetime.datetime(1970,1,1)`, where ds is DataShape (used by Blaze). Blaze also works pretty well with pandas' dataframes, but still it does not optimize the column operations as pandas does. By the way, thank you very much for your kind answers :) – pazqo Nov 10 '14 at 15:58
  • 1
    @pazqo, not a problem. Gutted couldn't help further as I can't assume how **Blaze** works lol. Good luck anyhow ;) – Anzel Nov 10 '14 at 16:02