Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
4
votes
1 answer

Using pandas and PyTables (3.1.1) at the same time, re-opening an already open file

I use pandas and pytables (3.1.1) at once. The problem is that I already opened an HDF5 file with pytables and when I try to create a new HDF5Store with pandas hdf5store = HDFStore(...) I get the following error: File…
SmCaterpillar
  • 6,683
  • 7
  • 42
  • 70
4
votes
1 answer

pandas HDFStore select rows by datetime index

I'm sure this is probably very simple but I can't figure out how to slice a pandas HDFStore table by its datetime index to get a specific range of rows. I have a table that looks like this: mdstore = pd.HDFStore(store.h5) histTable =…
fantabolous
  • 21,470
  • 7
  • 54
  • 51
4
votes
1 answer

Pandas pytable: how to specify min_itemsize of the elements of a MultiIndex

I am storing a pandas dataframe as a pytable which contains a MultiIndex. The first level of the MultiIndex is a string corresponding to a userID. Now, most of the userIDs are 13 characters long, but some of them are 15 characters long. When I…
danieleb
  • 115
  • 2
  • 8
4
votes
0 answers

Using Pandas to create, read, and update hdf5 file structure

We would like to be able to allow the HDF5 files themselves to define their columns, indexes, and column types instead of maintaining a separate file that defines structure of the HDF5 data. How can I create an empty HDF5 file from Pandas with a…
PlaidFan
  • 797
  • 11
  • 20
4
votes
1 answer

Matrix multiplication using hdf5

I'm trying to multiplicate 2 big matrices with memory limit using hdf5 (pytables) but function numpy.dot seems to give me error: Valueerror: array is too big I need to do matrix multiplication by myself maybe blockwise or there is some another…
mrgloom
  • 20,061
  • 36
  • 171
  • 301
4
votes
1 answer

merging several hdf5 files into one pytable

I have several hdf5 files, each of them with the same structure. I'd like to create one pytable out of them by somehow merging the hdf5 files. What I mean is that if an array in file1 has size x and array in file2 has size y, the resulting array in…
Asen Christov
  • 848
  • 6
  • 21
4
votes
1 answer

What is the advantage of PyTables?

I have recently started learning about PyTables and found it very interesting. My question is: What are the basic advantages of PyTables over database(s) when it comes to huge datasets? What is the basic purpose of this package (I can do same sort…
khan
  • 7,005
  • 15
  • 48
  • 70
4
votes
1 answer

Pandas Pytables warnings and slow performance

I have been testing out pandas and pytables for some large financial data sets, and have run in to a real stumbling block: When storing in a pytables file, pandas appears to be storing multidimensional data in massively long rows, not columns. try…
John_C
  • 788
  • 5
  • 17
4
votes
1 answer

PyTables thread-safe?

I am trying to use the Python thread module together with PyTables. Can someone tell me whether PyTabes is thread-safe? I get some errors and it seems to be related to tht threading. Thanks, Mark
Mark
  • 1,333
  • 1
  • 14
  • 21
4
votes
1 answer

Capping a sub-expression in numexpr

How do I efficiently express the following using numexpr? z = min(x-y, 1.0) / (x+y) Here, x and y are some large NumPy arrays of the same shape. In other words, I am trying to cap x-y to 1.0 before dividing it by x+y. I would like to do this using…
NPE
  • 486,780
  • 108
  • 951
  • 1,012
3
votes
2 answers

Appending large amount of data to a tables (HDF5) database where database.numcols != newdata.numcols?

I am trying to append a large dataset (>30Gb) to an existing pytables table. The table is N columns, and the dataset is N-1 columns; one column is calculated after I know the other N-1 columns. I'm using numpy.fromfile() to read chunks of the…
Phil
  • 2,080
  • 2
  • 12
  • 13
3
votes
0 answers

sharing a PyTable across multiprocesses

I create a PyTable object W_hat where processes should share and save the results their instead of returning them. from multiprocessing import Lock from multiprocessing import Pool import tables as tb def parallel_l21(labels, X, lam, g, W_hat): …
rando
  • 365
  • 3
  • 12
3
votes
1 answer

Trying to install pytables for python3

I use pip python -m pip install tables But then I get this error Collecting tables Using cached tables-3.6.1.tar.gz (4.6 MB) ERROR: Command errored out with exit status 1: command: /Users/collin.dubois/.pyenv/versions/3.9.1/bin/python3 -c…
Collin
  • 41
  • 1
  • 2
3
votes
0 answers

Error in HDF5 generator when using multiprocessing and more than one worker

I wrote a generator for Keras that uses Pytables for getting images from an HDF5 file (see code below). It works fine, when calling it like so: self._model.fit_generator(self.training_generator, epochs=epochs, …
packoman
  • 1,230
  • 1
  • 16
  • 36
3
votes
1 answer

Install failure for pytables in terminal

When I try 'pip install pytable', it yields the error 'no matching distribution found for basicproperty>=0.6.9a'. What could be the problem?
Gibbs H.
  • 41
  • 1
  • 5