Renaming a table in pandas hdfstore

Question

I am using pandas to join several huge csv files using HDFStore. I'm merging all the other tables to a base table, base. Right now I create a new table in the HDFStore for the output of each merge, which I call temp. Then I delete the old base table. Finally, I copy temp to base and start the process over again on the next table I need to join.

This would be much more efficient if I could simply rename temp to base. Is this possible?

Luke, I'm curious why you wouldn't just append additional csv's directly to the base table, rather than have the intermediate (slow) step of creating a new table? — fantabolous, Aug 12 '14 at 12:08

Dan Allan · Accepted Answer · 2014-04-01T23:17:37.160

7

Yes, it is possible. You have to delve into the methods from PyTables, on which HDFStore depends.

Out[20]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/a            frame        (shape->[3,1])

In [21]: store.get_node('a')._f_rename('b')

In [22]: store
Out[22]: 
<class 'pandas.io.pytables.HDFStore'>
File path: test.h5
/b            frame        (shape->[3,1])

The same method works on frame_table appendable nodes.

edited Apr 01 '14 at 23:17

answered Apr 01 '14 at 23:11

Dan Allan

34,073
6
70
63

Thanks, oddly there doesn't appear to be any speed improvement. – Luke Apr 01 '14 at 23:23
Hmm. I'm not deeply familiar with the internals. If @Jeff drops by he might be able to shed some light on this. – Dan Allan Apr 01 '14 at 23:24
using your procedure the file will continue to grow; you should ptrepack if you are deleting a lot. not clear where you think a speed up would be – Jeff Apr 01 '14 at 23:38
I think the speedup would be renaming the node `temp` to `base` instead of copying the node `temp` into `base`, naively analogous to `mv` vs `cp`. – Dan Allan Apr 01 '14 at 23:44
@Jeff, Dan Allan is right about where I thought the speed-up might be. Would ptrepack be faster than deleting and changing the name or is that essentially what ptrepack is doing? – Luke Apr 02 '14 at 00:01
1

No renaming is fine and @Dan Allan answer is right. Deleting doesn't reclaim space, nor make the store more efficient. ptrepack repacks the file to compute an optimal chunksize and reclaims space. See here: http://pandas.pydata.org/pandas-docs/stable/io.html#compression – Jeff Apr 02 '14 at 00:03
I found this method helpful for other things too (such as getting a list of children in a store group). List of properties/methods: http://pytables.github.io/usersguide/libref/hierarchy_classes.html – fantabolous Aug 13 '14 at 02:41

Renaming a table in pandas hdfstore

1 Answers1