Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
15
votes
3 answers

Pandas HDF5 as a Database

I've been using python pandas for the last year and I'm really impressed by its performance and functionalities, however pandas is not a database yet. I've been thinking lately on ways to integrate the analysis power of pandas into a flat HDF5 file…
prl900
  • 4,029
  • 4
  • 33
  • 40
15
votes
5 answers

Saving dictionaries to file (numpy and Python 2/3 friendly)

I want to do hierarchical key-value storage in Python, which basically boils down to storing dictionaries to files. By that I mean any type of dictionary structure, that may contain other dictionaries, numpy arrays, serializable Python objects, and…
Gustav Larsson
  • 8,199
  • 3
  • 31
  • 51
15
votes
5 answers

Getting Pypy to recognize third party modules

Just a quick question, how do I get pypy to recognize third pary modules that I have in Python? For instance, I get the following error. from tables import * ImportError: No Module named tables Which is basically saying that it cannot find my…
jab
  • 5,673
  • 9
  • 53
  • 84
14
votes
1 answer

Difference between HDF5 file and PyTables file

Is there a difference between HDF5 files and files created by PyTables? PyTables has two functions .isHDFfile() and .isPyTablesFile() suggesting that there is a difference between the two formats. I've done some looking around on Google and have…
dtlussier
  • 3,018
  • 2
  • 26
  • 22
14
votes
1 answer

Floating Point Exception with Numpy and PyTables

I have a rather large HDF5 file generated by PyTables that I am attempting to read on a cluster. I am running into a problem with NumPy as I read in an individual chunk. Let's go with the example: The total shape of the array within in the HDF5 file…
Tarun Chitra
  • 241
  • 1
  • 4
14
votes
2 answers

PyTables vs. SQLite3 insertion speed

I bought Kibot's stock data and it is enormous. I have about 125,000,000 rows to load (1000 stocks * 125k rows/stock [1-minute bar data since 2010-01-01], each stock in a CSV file whose fields are Date,Time,Open,High,Low,Close,Volume). I'm totally…
jdmarino
  • 545
  • 5
  • 13
14
votes
1 answer

How to concat multiple pandas dataframes into one dask dataframe larger than memory?

I am parsing tab-delimited data to create tabular data, which I would like to store in an HDF5. My problem is I have to aggregate the data into one format, and then dump into HDF5. This is ~1 TB-sized data, so I naturally cannot fit this into RAM.…
ShanZhengYang
  • 16,511
  • 49
  • 132
  • 234
14
votes
3 answers

Unable to save DataFrame to HDF5 ("object header message is too large")

I have a DataFrame in Pandas: In [7]: my_df Out[7]: Int64Index: 34 entries, 0 to 0 Columns: 2661 entries, airplane to zoo dtypes: float64(2659), object(2) When I try to save this to disk: store =…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
14
votes
1 answer

Merging two tables with millions of rows in Python

I am using Python for some data analysis. I have two tables, the first (let's call it 'A') has 10 million rows and 10 columns and the second ('B') has 73 million rows and 2 columns. They have 1 column with common ids and I want to intersect the two…
user2027051
  • 153
  • 1
  • 1
  • 4
13
votes
1 answer

pytables writes much faster than h5py. Why?

I noticed that writing .h5 files takes much longer if I use the h5py library instead of the pytables library. What is the reason? This is also true when the shape of the array is known before. Further, i use the same chunksize and no compression…
adku1173
  • 181
  • 1
  • 5
13
votes
1 answer

Query HDF5 in Pandas

I have following data (18,619,211 rows) stored as a pandas dataframe object in hdf5 file: date id2 w id 100010 1980-03-31 10401 0.000839 100010 1980-03-31 10604 0.020140 100010 1980-03-31 …
user3576212
  • 3,255
  • 9
  • 25
  • 33
12
votes
5 answers

Could not find HDF5 installation for PyTables on M1 Mac

Running on M1 Mac, macOS Monterey 12.4, Python 3.10.3 pip install tables Collecting tables Using cached tables-3.7.0.tar.gz (8.2 MB) Installing build dependencies ... done Getting requirements to build wheel ... error error:…
Bn.F76
  • 783
  • 2
  • 12
  • 30
12
votes
1 answer

HDFStore with string columns gives issues

I have a pandas DataFrame myDF with a few string columns (whose dtype is object) and many numeric columns. I tried the following: d=pandas.HDFStore("C:\\PF\\Temp.h5") d['test']=myDF I got this…
uday
  • 6,453
  • 13
  • 56
  • 94
12
votes
2 answers

Reading a large table with millions of rows from Oracle and writing to HDF5

I am working with an Oracle database with millions of rows and 100+ columns. I am attempting to store this data in an HDF5 file using pytables with certain columns indexed. I will be reading subsets of these data in a pandas DataFrame and performing…
smartexpert
  • 2,625
  • 3
  • 24
  • 41
11
votes
3 answers

In PyTables, how to create nested array of variable length?

I'm using PyTables 2.2.1 w/ Python 2.6, and I would like to create a table which contains nested arrays of variable length. I have searched the PyTables documentation, and the tutorial example (PyTables Tutorial 3.8) shows how to create a nested…
plmcw
  • 113
  • 1
  • 7
1
2
3
41 42