Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
5
votes
1 answer

PyTables install with python 3.11 fails on macOS M1

$ python -m pip install tables stops with Error: compiling Cython file Environment (I am within a virtual environment, created with pyenv. ) Only few packages installed atm Package Version ---------- ------- Cython 3.0.0 numpy …
theuema
  • 83
  • 5
5
votes
0 answers

Saving pandas DataFrame with nullable integer data type to HDF file (format='table')

How can one save pandas DataFrame with nullable integer data type to an HDF file in the 'table' format? # input data import pandas as pd, numpy as np df = pd.DataFrame(index=list(range(2)), data={'x':[np.uint8(1)]*2},…
S.V
  • 2,149
  • 2
  • 18
  • 41
5
votes
3 answers

How to copy a dataset object to a different hdf5 file using pytables or h5py?

I have selected specific hdf5 datasets and want to copy them to a new hdf5 file. I could find some tutorials on copying between two files, but what if you have just created a new file and you want to copy datasets to the file? I thought the way…
maynull
  • 1,936
  • 4
  • 26
  • 46
5
votes
1 answer

Is there a way to store PyTable columns in a specific order?

It seems that the PyTable columns are alphabetically ordered when using both dictionary or class for schema definition for the call to createTable(). My need is to establish a specific order and then use numpy.genfromtxt() to read and store my data…
tnt
  • 3,411
  • 5
  • 24
  • 23
5
votes
2 answers

HDF5 min_itemsize error: ValueError: Trying to store a string with len [##] in [y] column but this column has a limit of [##]!

I am getting the following error after using pandas.HDFStore().append() ValueError: Trying to store a string with len [150] in [values_block_0] column but this column has a limit of [127]! Consider using min_itemsize to preset the sizes on these…
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
5
votes
2 answers

Pandas _metadata of DataFrame persistence error

I have finally figured out how to use _metadata from a DataFrame, everything works except I am unable to persist it such as to hdf5 or json. I know it works because I copy the frame and _metadata attributes copy over "non _metadata" attributes…
Skorpeo
  • 2,362
  • 2
  • 15
  • 20
5
votes
2 answers

Creating very large NUMPY arrays in small chunks (PyTables vs. numpy.memmap)

There are a bunch of questions on SO that appear to be the same, but they don't really answer my question fully. I think this is a pretty common use-case for computational scientists, so I'm creating a new question. QUESTION: I read in several small…
KartMan
  • 369
  • 3
  • 19
5
votes
1 answer

Pandas - retrieving HDF5 columns and memory usage

I have a simple question, I cannot help but feel like I am missing something obvious. I have read data from a source table (SQL Server) and have created an HDF5 file to store the data via the following: output.to_hdf('h5name', 'df', format='table',…
Craig S
  • 53
  • 4
5
votes
5 answers

How to create a Numpy array from a large list of list- python

I have a list of list with 1,200 rows and 500,000 columns. How do I convert it into a numpy array? I've read the solutions on Bypass "Array is too big" python error but they are not helping. I tried to put them into a numpy array: import…
alvas
  • 115,346
  • 109
  • 446
  • 738
5
votes
1 answer

Release hdf5 disk memory after table or node removal with pytables or pandas

I'm using HDFStore with pandas / pytables. After removing a table or object, hdf5 file size remains unaffected. It seems this space is reused afterwards when additional objects are added to store, but it can be an issue if large space is wasted. I…
jruizaranguren
  • 12,679
  • 7
  • 55
  • 73
5
votes
1 answer

optimizing a complex table.where() query in pytables?

I have a very large database - I'm working with a subset that is 350m rows, but ultimately it will be around 3b rows. My entire goal here is to optimize a particular type of query on this database, at the expense of pretty much everything but…
benjamin
  • 327
  • 1
  • 2
  • 12
5
votes
2 answers

Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new…
Elvin
  • 843
  • 8
  • 14
5
votes
2 answers

Is the only way to add a column in PyTables to create a new table and copy?

I am searching for a persistent data storage solution that can handle heterogenous data stored on disk. PyTables seems like an obvious choice, but the only information I can find on how to append new columns is a tutorial example. The tutorial has…
Zelazny7
  • 39,946
  • 18
  • 70
  • 84
4
votes
1 answer

Storing images and metadata with PyTables

I'm using PyTables to store some images as Array and CArray data types. For each of these images, I also want to store some basic metadata (e.g., EXIF data). I can imagine a number of approaches to storing both of these data formats, ranging from…
Nick
  • 655
  • 1
  • 5
  • 16
4
votes
1 answer

Convert large hdf5 dataset written via pandas/pytables to vaex

I have a very large dataset I write to hdf5 in chunks via append like so: with pd.HDFStore(self.train_store_path) as train_store: for filepath in tqdm(filepaths): with open(filepath, 'rb') as file: frame =…
sobek
  • 1,386
  • 10
  • 28