1

Is it clear what I am doing wrong?

I'm experimenting with pandas HDFStore.select start and stop options and it's not making a difference.

The commands I'm using are:

import pandas as pd 
hdf = pd.HDFStore(path % 'results')
len(hdf.select('results',start=15,stop=20))

hoping to get a length of 4 or 5 or however it's counted, but it gives me the whole darn dataframe.

Here is a screenshot:enter image description here

user3659451
  • 1,913
  • 9
  • 30
  • 43
  • it's a bug currently on a fixed store: https://github.com/pydata/pandas/issues/8287; pull requests welcome. you can use this in a table format store which is quite a bit more flexible in any event. – Jeff Nov 17 '14 at 01:22
  • hey thanks Jeff. I still have one issue I'd appreciate a tip on. It seems I'm getting a weird error when running format='table', and I can't post more than one question on SO every 90 minutes, so here's a screenshot: https://www.dropbox.com/s/lc06nmitdu29vek/Screenshot%202014-11-16%2017.33.02.png?dl=0 – user3659451 Nov 17 '14 at 01:33
  • I would say that your file is corrupt, maybe got interrupted writing it (that's the caveat with these files, these are extremely fast, but you can only write with 1 process/thread AT A TIME). and if you interrupt an operation they can get corrupted as they are writing metadata to the file and such. I very rarely see these kinds of things. Erase the file and try again. – Jeff Nov 17 '14 at 02:14
  • @Jeff here's what I'm doing. I load up a bunch of csv's into dataframes. I merge the dataframes into a single dataframe, including some processing. But in the end I have a dataframe, yes there are NaNs, but it's nice. The problem is that this processing takes some time and I want to save my results to an h5 which I will query in the future. This error is reproduceable, though I will restart my VM. My problem is now posted: http://stackoverflow.com/questions/26964964/hdf5-error-when-format-table-pandas-pytables – user3659451 Nov 17 '14 at 03:25

1 Answers1

1

When writing to the h5 file, select pandas.to_hdf(<path>,<key>,format='tables') which enables subsets of the store to be selected. However, this is a bug as you should get an error.

According to Jeff (https://stackoverflow.com/users/644898/jeff),

this is a known bug and has a fix here: github.com/pydata/pandas/issues/8287

Pull requests welcome.

Community
  • 1
  • 1
user3659451
  • 1,913
  • 9
  • 30
  • 43