0

I write a lot of tests (nose based) involving DataFrame. Those tests should be readable by end-users. DataFrame constructors are not very friendly to read compared to a plain text table representation.

What about using a text representation like reStructured to construct/assert DataFrame ?

=========== =========== ========= ========= ========================
id1         id2         net       nnet      desc
(int64)     (int64)     (float64) (float64) (object)
----------- ----------- --------- --------- ------------------------
1001        1002             10.0       0.0 Closed part of queue
1002                          0.0       3.0 Opened part of queue
=========== =========== ========= ========= ========================

The (dtype) line is useful to enforce the columns type to not fail on assert (could be optional).

I need community feedback before coding this reST DataFrame construct/assert feature. I also think about using ipython notebooks as test cases.

What is your preferred DataFrame representation when readability counts ?

PhE
  • 15,656
  • 4
  • 23
  • 21
  • Forgot to mention that reST representation could also help for documentation (Sphinx) – PhE Sep 14 '12 at 08:20

1 Answers1

1

Constructing from a reST table is not possible, but would be interesting. You can use read_csv to read in a table. See also read_clipboard and read_fwf (fixed width)

In [22]: table = """\
   ....: id1         id2         net       nnet       desc
   ....: 1001        1002             10.0       0.0  Closed part of queue
   ....: 1002        NaN               0.0       3.0  Opened part of queue
   ....: """

In [23]: df = pandas.read_csv(StringIO(table), sep='[\s]{2,}')

In [24]: df
Out[24]: 
    id1   id2  net  nnet                  desc
0  1001  1002   10     0  Closed part of queue
1  1002   NaN    0     3  Opened part of queue
Wouter Overmeire
  • 65,766
  • 10
  • 63
  • 43
  • Thanks for the regex separator ! The 'NaN' value converted to a np.NaN is also good news. But I have to find a solution for the dtypes since columns id1/id2 are int64/float64 and make assert fails. – PhE Sep 14 '12 at 12:06
  • It is not possible to have NaN inside int64 columns. – Wouter Overmeire Sep 14 '12 at 12:56