Questions tagged [hdfstore]

HDFStore is a Python interface that is part of the Pandas Data Analysis Library support for reading and writing HDF format files.

Pandas is a popular Data Analysis Library for Python with sophisticated support for rich data-structures suitable for data-analysis, such as DataFrame, reminiscent of the R statistical computing environment.

One popular storage format for table and array-like data is HDF5 and the pandas HDFStore interface provides an easy-to-use wrapper around the PyTables library for HDF5 file IO.

Questions tagged with HDFStore are typically about using this Python Pandas interface and the HDF files it reads & writes.

137 questions
20
votes
2 answers

Get list of HDF5 contents (Pandas HDFStore)

I have no problem selecting content from a table within an HDF5 Store: with pandas.HDFStore(data_store) as hdf: df_reader = hdf.select('my_table_id', chunksize=10000) How can I get a list of all the tables to select from using pandas?
bcollins
  • 3,379
  • 4
  • 19
  • 35
15
votes
1 answer

How does one append large amounts of data to a Pandas HDFStore and get a natural unique index?

I'm importing large amounts of http logs (80GB+) into a Pandas HDFStore for statistical processing. Even within a single import file I need to batch the content as I load it. My tactic thus far has been to read the parsed lines into a DataFrame then…
Ben Scherrey
  • 329
  • 1
  • 4
  • 6
10
votes
1 answer

Get column names (headers) from hdf file

I was wondering how to get the column names (seemingly stored in the hdf header) of an hdf file; for example, a file might have columns named [a,b,c,d] while another file has columns [a,b,c] and yet another has columns [b,e,r,z]; and I would like to…
Cenoc
  • 11,172
  • 21
  • 58
  • 92
9
votes
2 answers

Append new columns to HDFStore with pandas

I'm using Pandas, and making a HDFStore object. I calculate 500 columns of data, and write it to a table format HDFStore object. Then I close the file, delete the data from memory, do the next 500 columns (labelled by an increasing integer), open up…
StevenMurray
  • 742
  • 2
  • 7
  • 18
9
votes
1 answer

Get inferred dataframe types iteratively using chunksize

How can I use pd.read_csv() to iteratively chunk through a file and retain the dtype and other meta-information as if I read in the entire dataset at once? I need to read in a dataset that is too large to fit into memory. I would like to…
Zelazny7
  • 39,946
  • 18
  • 70
  • 84
7
votes
1 answer

pandas - How to save only selected columns of a DataFrame to HDF5

I'm reading a csv sample file and store it on .h5 database. The .csv is structured as follows: User_ID;Longitude;Latitude;Year;Month;String 267261661;-3.86580025;40.32170825;2013;12;hello world 171255468;-3.83879575;40.05035005;2013;12;hello…
Fabio Lamanna
  • 20,504
  • 24
  • 90
  • 122
7
votes
2 answers

Pandas HDFStore of MultiIndex DataFrames: how to efficiently get all indexes

In Pandas, is there a way to efficiently pull out all the MultiIndex indexes present in an HDFStore in table format? I can select() efficiently using where=, but I want all indexes, and none of the columns. I can also select() using iterator=True…
Tony
  • 339
  • 3
  • 10
6
votes
3 answers

How to deal with pandas column that has a list of dicts in every cell

I have a DataFrame that includes a column where every cell is made up of a list of dicts, and each list of dicts is of varying length (including 0). An example: df = pd.DataFrame({'ID' : [13423,294847,322844,429847], 'RANKS': [[{u'name': u'A',…
James
  • 113
  • 1
  • 9
6
votes
2 answers

Peek the number of rows in an hdf5 file in pandas

I was wondering if there was a way of easily, quickly, and without loading the entire file, getting the number of rows in an hdf5 file, created using pandas, with pandas? Thank you in advance!
Cenoc
  • 11,172
  • 21
  • 58
  • 92
6
votes
1 answer

Pandas HDFStore unload dataframe from memory

OK I am experimenting with pandas to load around a 30GB csv file with 40 million+ rows and 150+ columns in to HDFStore. The majority of the columns are strings, followed by numerical and dates. I have never really used numpy, pandas or pytables…
smartexpert
  • 2,625
  • 3
  • 24
  • 41
6
votes
1 answer

HDFStore: table.select and RAM usage

I am trying to select random rows from a HDFStore table of about 1 GB. RAM usage explodes when I ask for about 50 random rows. I am using pandas 0-11-dev, python 2.7, linux64. In this first case the RAM usage fits the size of chunk with…
user17375
  • 529
  • 4
  • 14
5
votes
1 answer

Can I update an HDFStore?

Consider the following hdfstore and dataframes df and df2 import pandas as pd store = pd.HDFStore('test.h5') midx = pd.MultiIndex.from_product([range(2), list('XYZ')], names=list('AB')) df = pd.DataFrame(dict(C=range(6)), midx) df C A B …
piRSquared
  • 285,575
  • 57
  • 475
  • 624
5
votes
1 answer

Import huge data-set from SQL server to HDF5

I am trying to import ~12 Million records with 8 columns into Python.Because of its huge size my laptop memory would not be sufficient for this. Now I'm trying to import the SQL data into a HDF5 file format. It would be very helpful if someone can…
user3510503
  • 308
  • 2
  • 4
  • 13
5
votes
2 answers

HDF5 min_itemsize error: ValueError: Trying to store a string with len [##] in [y] column but this column has a limit of [##]!

I am getting the following error after using pandas.HDFStore().append() ValueError: Trying to store a string with len [150] in [values_block_0] column but this column has a limit of [127]! Consider using min_itemsize to preset the sizes on these…
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
5
votes
1 answer

Pandas HDFStore.create_table_index not increasing select query speed, looking for a better way to search

I have created an HDFStore. The HDFStore contains a group df which is a table with 2 columns. The first column is a string and second column is DateTime(which will be in sorted order). The Store has been created using the following method: from…
harmands
  • 1,082
  • 9
  • 24
1
2 3
9 10