Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
4
votes
1 answer

Efficient HDF5 / PyTables Layout for saving and operating on large tensors

I am trying to figure out the best data layout for my use case (a research project). This is not my speciality so while I can articulate what I want, and what I think may work, I am trying to steer away from failure paths. For now, assume tha the…
IMA
  • 261
  • 2
  • 10
4
votes
4 answers

How can I combine multiple .h5 file?

Everything that is available online is too complicated. My database is large to I exported it in parts. I now have three .h5 file and I would like to combine them into one .h5 file for further work. How can I do it?
ktt_11
  • 41
  • 1
  • 1
  • 5
4
votes
3 answers

DLL load failed for pytables

I get the following error when running code containing pytables: Traceback (most recent call last): File "C:\Users\pierr\python354\lib\site-packages\pandas\io\pytables.py", line 469, in __init__ import tables # noqa File…
compmonks
  • 647
  • 10
  • 24
4
votes
3 answers

Filter HDF dataset from H5 file using attribute

I have an h5 file containing multiple groups and datasets. Each dataset has associated attributes. I want to find/filter the datasets in this h5 file based upon the respective attribute associated with it. Example: dataset1 =cloudy(attribute)…
sumit c
  • 93
  • 1
  • 8
4
votes
2 answers

Efficient way to store array to persistent memory in python

Let's say we have a long one dimensional array like this with millions of elements: [0,1,1,1,1,2,1,1,1,1,1,1,1,...,1,2,2,2,2,2,2,2,4,4,4,4,4,4,4,4,4,3,4,1,1,1,1] If there was just one repeating element we could use a sparse array but since it can be…
meow
  • 2,062
  • 2
  • 17
  • 27
4
votes
0 answers

How to efficiently calculate 160146 by 160146 matrix inverse in python?

My research is into structural dynamics and i am dealing with large symmetric sparse matrix calculation. Recently, i have to calculate the stiffness matrix (160146 by 160146) inverse with 4813762 non zero elements. I did calculate a smaller…
Paul Thomas
  • 477
  • 1
  • 7
  • 15
4
votes
2 answers

Storing as Pandas DataFrames and Updating as Pytables

Can you store data as pandas HDFStore and open them / perform i/o using pytables? The reason this question comes up is because I am currently storing data as pd.HDFStore('Filename',mode='a') store.append(data) However, as i understand pandas…
CodeGeek123
  • 4,341
  • 8
  • 50
  • 79
4
votes
2 answers

How does one store a Pandas DataFrame as an HDF5 PyTables table (or CArray, EArray, etc.)?

I have the following pandas dataframe: import pandas as pd df = pd.read_csv(filename.csv) Now, I can use HDFStore to write the df object to file (like adding key-value pairs to a Python dictionary): store = HDFStore('store.h5') store['df'] =…
JianguoHisiang
  • 609
  • 2
  • 7
  • 17
4
votes
0 answers

Retrieve column names from HDF5 store using Pytables

I have a database with following column structure A, B , C / |b1, b2, b3| In this structure 'B' has sub-columns b1,b2, b3 i.e. column B has array of entries. I have found that b1,b2, b3 have different column name as well,…
sonus21
  • 5,178
  • 2
  • 23
  • 48
4
votes
2 answers

Create HDF5 Group /Table if it does not exist

I am building an HDF5 file using PyTables python package. The file would be updated everyday with latest tick data. I want to create two groups - Quotes and Trades and tables for different futures expiries. I want to check if the group Quotes…
Kapil Sharma
  • 1,412
  • 1
  • 15
  • 19
4
votes
1 answer

How to effiiciently rebuild pandas hdfstore table when append fails

I am working on using the hdfstore in pandas to data frames from an ongoing iterative process. At each iteration, I append to a table in the hdfstore. Here is a toy example: import pandas as pd from pandas import HDFStore import numpy as np from…
jmerkow
  • 1,811
  • 3
  • 20
  • 35
4
votes
2 answers

Preventing PyTables (in Pandas) from printing "Closing remaining open files..."

Is there a way to prevent PyTables from printing out Closing remaining open files:path/to/store.h5...done? I want to get rid of it just because it is clogging up the terminal. I'm using pandas.HDFStore if that matters.
Ty Pavicich
  • 1,050
  • 3
  • 9
  • 24
4
votes
0 answers

Performance issues with writing data to HDFStore

I am continuouisly writing simulation output data to a HDFStore that has become quite big (~15 GB). Additionally, I get the following performance warning: /home/extern/fsalah/.local/lib/python2.7/site-packages/tables/group.py:501:…
Florian S
  • 41
  • 2
4
votes
1 answer

Converting a dataset into an HDF5 dataset

I have a dataset that I would like to convert to an HDF5 format. It is a dataset from NOAA. The directory structure is something like: NOAA ├── code ├── ghcnd_all ├── ghcnd_all.tar.gz ├── ghcnd-stations.txt ├── ghcnd-version.txt ├── readme.txt └──…
wgwz
  • 2,642
  • 2
  • 23
  • 35
4
votes
2 answers

TypeError: read_hdf() takes exactly 2 arguments (1 given)

How to open a HDF5 file with pandas.read_hdf when the keys are not known? from pandas.io.pytables import read_hdf read_hdf(path_or_buf, key) pandas.__version__ == '0.14.1' Here the key parameter is not known. Thanks
Asif Rehan
  • 983
  • 2
  • 8
  • 25