1

Going through this tutorial about blaze, but using the iris dataset in a local postgresql db.

I dont seem to get the same output as shown when using db.iris.Species.distinct() (see In 16 of the Ipython notebook).

My connection string is postgresql://postgres:postgres@localhost:5432/blaze_test

and my simple Python code is:

import blaze as bz
db = bz.Data('postgresql://postgres:postgres@localhost:5432/blaze_test')
mySpecies = db.iris_data.species.distinct()
print mySpecies

All I get in the console (using the Spyder IDE) is distinct(_55.iris_data.species)

How can actually print the distinct species in the table?

NB:I know I am using lowercase "s" for the "species" part in the code, otherwise I just get an error to say: 'Field' object has no attribute 'Species'

user965586
  • 563
  • 5
  • 22

2 Answers2

3

The printing mechanism is tripping you up a bit here.

The __str__ implementation (which is what Python's print function calls) returns a string version of the expression.

The __repr__ implementation (called when you execute a line in the interpreter) triggers computation and thus allows you to see the results of a computation.

In [2]: iris = Data(odo(os.path.abspath('./blaze/examples/data/iris.csv'), 'postgresql://localhost::iris'))

In [3]: iris
Out[3]:
    sepal_length  sepal_width  petal_length  petal_width      species
0            5.1          3.5           1.4          0.2  Iris-setosa
1            4.9          3.0           1.4          0.2  Iris-setosa
2            4.7          3.2           1.3          0.2  Iris-setosa
3            4.6          3.1           1.5          0.2  Iris-setosa
4            5.0          3.6           1.4          0.2  Iris-setosa
5            5.4          3.9           1.7          0.4  Iris-setosa
6            4.6          3.4           1.4          0.3  Iris-setosa
7            5.0          3.4           1.5          0.2  Iris-setosa
8            4.4          2.9           1.4          0.2  Iris-setosa
9            4.9          3.1           1.5          0.1  Iris-setosa
...

In [4]: iris.species.distinct()
Out[4]:
           species
0  Iris-versicolor
1   Iris-virginica
2      Iris-setosa

In [8]: print(str(iris.species.distinct()))
distinct(_1.species)

In [9]: print(repr(iris.species.distinct()))
           species
0  Iris-versicolor
1   Iris-virginica
2      Iris-setosa

If you want to shove the result into a concrete data structure like a pandas.Series, do this:

In [5]: odo(iris.species.distinct(), pd.Series)
Out[5]:
0    Iris-versicolor
1     Iris-virginica
2        Iris-setosa
Name: species, dtype: object
Phillip Cloud
  • 24,919
  • 11
  • 68
  • 88
0

Ok, I think I know now. The rest of the YouTube video made it a bit more clear.

I should do something like output = odo(mySpecies, pdDataFrame) or output = odo(mySpecies, list) then print output to do the transformation.

Other solutions/points welcome.

user965586
  • 563
  • 5
  • 22