Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions

votes

1 answer

PyTables install with python 3.11 fails on macOS M1

$ python -m pip install tables stops with Error: compiling Cython file Environment (I am within a virtual environment, created with pyenv. ) Only few packages installed atm Package Version ---------- ------- Cython 3.0.0 numpy …

asked Jul 27 '23 at 14:52

theuema

votes

0 answers

Saving pandas DataFrame with nullable integer data type to HDF file (format='table')

How can one save pandas DataFrame with nullable integer data type to an HDF file in the 'table' format? # input data import pandas as pd, numpy as np df = pd.DataFrame(index=list(range(2)), data={'x':[np.uint8(1)]*2},…

python pandas dataframe pytables hdf

asked Jul 31 '19 at 16:37

S.V

2,149
2
18
41

votes

3 answers

How to copy a dataset object to a different hdf5 file using pytables or h5py?

I have selected specific hdf5 datasets and want to copy them to a new hdf5 file. I could find some tutorials on copying between two files, but what if you have just created a new file and you want to copy datasets to the file? I thought the way…

python hdf5 h5py pytables

asked Nov 24 '18 at 06:22

maynull

1,936
4
26
46

votes

1 answer

Is there a way to store PyTable columns in a specific order?

It seems that the PyTable columns are alphabetically ordered when using both dictionary or class for schema definition for the call to createTable(). My need is to establish a specific order and then use numpy.genfromtxt() to read and store my data…

pytables

asked Nov 29 '10 at 13:02

tnt

3,411
5
24
23

votes

2 answers

HDF5 min_itemsize error: ValueError: Trying to store a string with len [##] in [y] column but this column has a limit of [##]!

I am getting the following error after using pandas.HDFStore().append() ValueError: Trying to store a string with len [150] in [values_block_0] column but this column has a limit of [127]! Consider using min_itemsize to preset the sizes on these…

python pandas hdf5 pytables hdfstore

asked Oct 10 '16 at 06:54

ShanZhengYang

16,511
49
132
234

votes

2 answers

Pandas _metadata of DataFrame persistence error

I have finally figured out how to use _metadata from a DataFrame, everything works except I am unable to persist it such as to hdf5 or json. I know it works because I copy the frame and _metadata attributes copy over "non _metadata" attributes…

python pandas metadata pytables

asked Jan 20 '15 at 09:28

Skorpeo

2,362
2
15
20

votes

2 answers

Creating very large NUMPY arrays in small chunks (PyTables vs. numpy.memmap)

There are a bunch of questions on SO that appear to be the same, but they don't really answer my question fully. I think this is a pretty common use-case for computational scientists, so I'm creating a new question. QUESTION: I read in several small…

python numpy large-files mmap pytables

asked Oct 06 '14 at 10:52

KartMan

votes

1 answer

Pandas - retrieving HDF5 columns and memory usage

I have a simple question, I cannot help but feel like I am missing something obvious. I have read data from a source table (SQL Server) and have created an HDF5 file to store the data via the following: output.to_hdf('h5name', 'df', format='table',…

python pandas pytables h5py

asked Sep 17 '14 at 23:50

Craig S

votes

5 answers

How to create a Numpy array from a large list of list- python

I have a list of list with 1,200 rows and 500,000 columns. How do I convert it into a numpy array? I've read the solutions on Bypass "Array is too big" python error but they are not helping. I tried to put them into a numpy array: import…

python arrays numpy pandas pytables

asked Mar 18 '14 at 00:32

alvas

115,346
109
446
738

votes

1 answer

Release hdf5 disk memory after table or node removal with pytables or pandas

I'm using HDFStore with pandas / pytables. After removing a table or object, hdf5 file size remains unaffected. It seems this space is reused afterwards when additional objects are added to store, but it can be an issue if large space is wasted. I…

python pandas hdf5 pytables

asked Jan 13 '14 at 11:32

jruizaranguren

12,679
7
55
73

votes

1 answer

optimizing a complex table.where() query in pytables?

I have a very large database - I'm working with a subset that is 350m rows, but ultimately it will be around 3b rows. My entire goal here is to optimize a particular type of query on this database, at the expense of pretty much everything but…

python pytables

asked Jul 25 '13 at 21:03

benjamin

votes

2 answers

Update pandas DataFrame in stored in a Pytable with another pandas DataFrame

I am trying to create a function that updates a pandas DataFrame stored that I have stored in a PyTable with new data from a pandas DataFrame. I want to check if some data is missing in the PyTable for specific DatetimeIndexes (value is NaN or a new…

python pandas hdf5 pytables dataframe

asked Jun 10 '13 at 09:44

Elvin

votes

2 answers

Is the only way to add a column in PyTables to create a new table and copy?

I am searching for a persistent data storage solution that can handle heterogenous data stored on disk. PyTables seems like an obvious choice, but the only information I can find on how to append new columns is a tutorial example. The tutorial has…

python pytables

asked Apr 03 '13 at 20:13

Zelazny7

39,946
18
70
84

votes

1 answer

Storing images and metadata with PyTables

I'm using PyTables to store some images as Array and CArray data types. For each of these images, I also want to store some basic metadata (e.g., EXIF data). I can imagine a number of approaches to storing both of these data formats, ranging from…

python image numpy metadata pytables

asked Dec 30 '11 at 23:06

Nick

votes

1 answer

Convert large hdf5 dataset written via pandas/pytables to vaex

I have a very large dataset I write to hdf5 in chunks via append like so: with pd.HDFStore(self.train_store_path) as train_store: for filepath in tqdm(filepaths): with open(filepath, 'rb') as file: frame =…

python pandas hdf5 pytables vaex

asked Dec 10 '19 at 09:44

sobek

1,386
10
28

Prev 1 2 3

…

41 42 Next