Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions

votes

3 answers

Pandas HDF5 as a Database

I've been using python pandas for the last year and I'm really impressed by its performance and functionalities, however pandas is not a database yet. I've been thinking lately on ways to integrate the analysis power of pandas into a flat HDF5 file…

asked Mar 20 '14 at 02:55

prl900

4,029
4
33
40

votes

5 answers

Saving dictionaries to file (numpy and Python 2/3 friendly)

I want to do hierarchical key-value storage in Python, which basically boils down to storing dictionaries to files. By that I mean any type of dictionary structure, that may contain other dictionaries, numpy arrays, serializable Python objects, and…

python python-3.x numpy hdf5 pytables

asked Aug 06 '13 at 02:52

Gustav Larsson

8,199
3
31
51

votes

5 answers

Getting Pypy to recognize third party modules

Just a quick question, how do I get pypy to recognize third pary modules that I have in Python? For instance, I get the following error. from tables import * ImportError: No Module named tables Which is basically saying that it cannot find my…

python pypy pytables

asked Jun 25 '12 at 18:38

jab

5,673
9
53
84

votes

1 answer

Difference between HDF5 file and PyTables file

Is there a difference between HDF5 files and files created by PyTables? PyTables has two functions .isHDFfile() and .isPyTablesFile() suggesting that there is a difference between the two formats. I've done some looking around on Google and have…

python numpy hdf5 pytables

asked Nov 03 '11 at 22:13

dtlussier

3,018
2
26
22

votes

1 answer

Floating Point Exception with Numpy and PyTables

I have a rather large HDF5 file generated by PyTables that I am attempting to read on a cluster. I am running into a problem with NumPy as I read in an individual chunk. Let's go with the example: The total shape of the array within in the HDF5 file…

python numpy hdf5 pytables

asked Sep 30 '11 at 23:46

Tarun Chitra

votes

2 answers

PyTables vs. SQLite3 insertion speed

I bought Kibot's stock data and it is enormous. I have about 125,000,000 rows to load (1000 stocks * 125k rows/stock [1-minute bar data since 2010-01-01], each stock in a CSV file whose fields are Date,Time,Open,High,Low,Close,Volume). I'm totally…

python sqlite pytables

asked May 21 '11 at 18:41

jdmarino

votes

1 answer

How to concat multiple pandas dataframes into one dask dataframe larger than memory?

I am parsing tab-delimited data to create tabular data, which I would like to store in an HDF5. My problem is I have to aggregate the data into one format, and then dump into HDF5. This is ~1 TB-sized data, so I naturally cannot fit this into RAM.…

pandas hdf5 dask pytables bigdata

asked Oct 09 '16 at 20:18

ShanZhengYang

16,511
49
132
234

votes

3 answers

Unable to save DataFrame to HDF5 ("object header message is too large")

I have a DataFrame in Pandas: In [7]: my_df Out[7]: Int64Index: 34 entries, 0 to 0 Columns: 2661 entries, airplane to zoo dtypes: float64(2659), object(2) When I try to save this to disk: store =…

python pandas hdf5 pytables

asked May 19 '13 at 21:09

Amelio Vazquez-Reina

91,494
132
359
564

votes

1 answer

Merging two tables with millions of rows in Python

I am using Python for some data analysis. I have two tables, the first (let's call it 'A') has 10 million rows and 10 columns and the second ('B') has 73 million rows and 2 columns. They have 1 column with common ids and I want to intersect the two…

python join merge pandas pytables

asked Jan 30 '13 at 21:51

user2027051

votes

1 answer

pytables writes much faster than h5py. Why?

I noticed that writing .h5 files takes much longer if I use the h5py library instead of the pytables library. What is the reason? This is also true when the shape of the array is known before. Further, i use the same chunksize and no compression…

python h5py pytables

asked Sep 16 '19 at 09:03

adku1173

votes

1 answer

Query HDF5 in Pandas

I have following data (18,619,211 rows) stored as a pandas dataframe object in hdf5 file: date id2 w id 100010 1980-03-31 10401 0.000839 100010 1980-03-31 10604 0.020140 100010 1980-03-31 …

python datetime pandas hdf5 pytables

asked May 26 '14 at 05:53

user3576212

3,255
9
25
33

votes

5 answers

Could not find HDF5 installation for PyTables on M1 Mac

Running on M1 Mac, macOS Monterey 12.4, Python 3.10.3 pip install tables Collecting tables Using cached tables-3.7.0.tar.gz (8.2 MB) Installing build dependencies ... done Getting requirements to build wheel ... error error:…

homebrew hdf5 pytables

asked Jul 19 '22 at 00:26

Bn.F76

votes

1 answer

HDFStore with string columns gives issues

I have a pandas DataFrame myDF with a few string columns (whose dtype is object) and many numeric columns. I tried the following: d=pandas.HDFStore("C:\\PF\\Temp.h5") d['test']=myDF I got this…

python-3.x pandas pytables

asked Apr 10 '14 at 20:56

uday

6,453
13
56
94

votes

2 answers

Reading a large table with millions of rows from Oracle and writing to HDF5

I am working with an Oracle database with millions of rows and 100+ columns. I am attempting to store this data in an HDF5 file using pytables with certain columns indexed. I will be reading subsets of these data in a pandas DataFrame and performing…

python pandas hdf5 pytables

asked Dec 16 '13 at 18:50

smartexpert

2,625
3
24
41

votes

3 answers

In PyTables, how to create nested array of variable length?

I'm using PyTables 2.2.1 w/ Python 2.6, and I would like to create a table which contains nested arrays of variable length. I have searched the PyTables documentation, and the tutorial example (PyTables Tutorial 3.8) shows how to create a nested…

python pytables

asked Mar 20 '11 at 01:12

plmcw

Prev 1

…

41 42 Next