0

When working with large datasets, I often want to test my code with very few samples. This allows me to spot bugs before investing a long time in calculations.

One of the time consuming steps is often reading the data, so it's nice that Pandas lets me specify nrows to read the file up to a certain line, then stop. I don't care about accuracy, but about code bugs.

I don't seem to find a similar functionlity when using numpy directly, either with getfromtxt or loadtxt. Am I overseeing something? I'll go ahead and look into it myself if it's not available, but I thought I'd check with you guys first. Thanks!

Miquel
  • 15,405
  • 8
  • 54
  • 87
  • 1
    You could just still use `nrows` using pandas and then access the underlying numpy array by calling [`.values`](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.values.html#pandas.DataFrame.values) attribute of the dataframe – EdChum Oct 26 '14 at 19:20
  • 1
    Does [this answer](http://stackoverflow.com/a/13663832/3923281) solve the issue for you? – Alex Riley Oct 26 '14 at 19:20
  • @EdChum Thanks. That'd work but Pandas will still try to figure out column types and the like when loading. True, if using only a few rows it won't make a difference. Hm. That'd work. Thanks! – Miquel Oct 26 '14 at 19:21
  • @ajcr Yes it does, `itertools.islice` handles this nicely, if not pretty-ly. Also, this question is a duplicate. I didn't find the one you quoted, thanks! – Miquel Oct 26 '14 at 19:23
  • @ajcr will you be posting this as answer? If you don't I will do it myself so as to declare the question closed. And thanks! – Miquel Oct 26 '14 at 19:43
  • @Miquel: No problem! I hadn't written anything further, so happy for you to post the answer that you feel fits your question best. – Alex Riley Oct 26 '14 at 19:52

0 Answers0