Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
7
votes
2 answers

Building a huge numpy array using pytables

How can I create a huge numpy array using pytables. I tried this but gives me the "ValueError: array is too big." error: import numpy as np import tables as tb ndim = 60000 h5file = tb.openFile('test.h5', mode='w', title="Test Array") root =…
Academia
  • 3,984
  • 6
  • 32
  • 49
7
votes
5 answers

Unable to reinstall PyTables for Python 2.7

I am installing Python 2.7 in addition to 2.7. When installing PyTables again for 2.7, I get this error - Found numpy 1.5.1 package installed. .. ERROR:: Could not find a local HDF5 installation. You may need to explicitly state where your local…
tnt
  • 3,411
  • 5
  • 24
  • 23
7
votes
1 answer

Why is querying a table so much slower after sorting it?

I have a Python program that uses Pytables and queries a table in this simple manner: def get_element(table, somevar): rows = table.where("colname == somevar") row = next(rows, None) if row: return elem_from_row(row) To reduce…
Djizeus
  • 4,161
  • 1
  • 24
  • 42
7
votes
1 answer

Pytables/Pandas : Combining (reading?) mutliple HDF5 stores split by rows

In "write once, read many" workflow, i frequently parse large text files (20GB-60GB) dumped from Teradata using FastExport utility and load them into Pytables using Pandas. I am using multiprocessing to chunk the text files and distributing them to…
Hussain Sultan
  • 185
  • 1
  • 10
7
votes
2 answers

finding a duplicate in a hdf5 pytable with 500e6 rows

Problem I have a large (> 500e6 rows) dataset that I've put into a pytables database. Lets say first column is ID, second column is counter for each ID. each ID-counter combination has to be unique. I have one non-unique row amongst 500e6 rows I'm…
scrooge
  • 141
  • 1
  • 1
  • 7
7
votes
2 answers

Numpy efficient big matrix multiplication

To store big matrix on disk I use numpy.memmap. Here is a sample code to test big matrix multiplication: import numpy as np import time rows= 10000 # it can be large for example 1kk cols= 1000 #create some data in memory data =…
mrgloom
  • 20,061
  • 36
  • 171
  • 301
7
votes
4 answers

Time based data analysis with Python

I've got a project where physical sensors send data to the server. Data is send irregularly - after something activated a sensor, but not less often than every 20 minutes. On the server data is stored in a Posgresql database. Data structure looks…
eXt
  • 489
  • 5
  • 11
7
votes
2 answers

`pip install tables` fail with ERROR:: Could not find a local HDF5 installation

Here's the detailed error message I am getting when I attempt to install PyTables on Mac OSX. calvin$ pip install tables Downloading/unpacking tables Downloading tables-2.4.0.tar.gz (8.9MB): 8.9MB downloaded Running setup.py egg_info for package…
Calvin Cheng
  • 35,640
  • 39
  • 116
  • 167
6
votes
1 answer

tables package for Python 3.9.1?

This is my first question on here. Thank you very much in advance for your support. I'm using Python 3.9.1 on a 64-bit Windows 10 machine and I've been trying to install the tables package by pip install tables but I always got the following…
pkx8326
  • 61
  • 1
  • 2
6
votes
2 answers

What does the PyTables warning "a closed node found in the registry" mean?

When using pandas.to_hdf function to save data to a HDF5 file, I'm getting the following warning: C:\{my path to conda environment}\lib\site-packages\tables\file.py:426: UserWarning: a closed node found in the registry: ``/{my object key}/meta/{my…
PGlivi
  • 996
  • 9
  • 12
6
votes
3 answers

Storing and reloading large multidimensional data sets in Python

I'm going to be running a large number of simulations producing a large amount of data that needs to be stored and accessed again later. Output data from my simulation program is written to text files (one per simulation). I plan on writing a Python…
dbb
  • 61
  • 1
  • 2
6
votes
1 answer

PyTables batch get and update

I have daily stock data as an HDF5 file created using PyTables. I would like to get a group of rows, process it as an array and then write it back to disk (update rows) using PyTables. I couldn't figure out a way to do this cleanly. Could you please…
Ecognium
  • 2,046
  • 1
  • 19
  • 35
6
votes
1 answer

In a Pandas categorical, what is format="table"?

The HDF5 format apparently does not support categoricals with format="fixed". The following example s = pd.Series(['a','b','a','b'],dtype='category') s.to_hdf('s.h5','s') Returns the error: NotImplementedError: Cannot store a category dtype in a…
Autumn
  • 3,214
  • 1
  • 20
  • 35
6
votes
1 answer

Compatibility between PyTables and h5py for HDF file format

I started working with HDF file format on Python a few weeks ago, and first thing you realize when doing this is that there are two main libraries that are both great though slightly different: Pytables (which works well with ViTables tool for…
iipr
  • 1,190
  • 12
  • 17
6
votes
1 answer

How to limit the size of pandas queries on HDF5 so it doesn't go over RAM limit?

Let's say I have a pandas Dataframe import pandas as pd df = pd.DataFrame() df Column1 Column2 0 0.189086 -0.093137 1 0.621479 1.551653 2 1.631438 -1.635403 3 0.473935 1.941249 4 1.904851 -0.195161 5 0.236945 -0.288274 6 -0.473348 …
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234