Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
11
votes
1 answer

Efficient way of inputting large raster data into PyTables

I am looking for the efficient way to feed up the raster data file (GeoTiff) with 20GB size into PyTables for further out of core computation. Currently I am reading it as numpy array using Gdal, and writing the numpy array into pytables using the…
user3235542
11
votes
5 answers

How to store wide tables in pytables / hdf5

I have data coming from a csv which has a few thousand columns and ten thousand (or so) rows. Within each column the data is of the same type, but different columns have data of different type*. Previously I have been pickling the data from numpy…
acrophobia
  • 924
  • 7
  • 22
11
votes
1 answer

HDFStore.append(string, DataFrame) fails when string column contents are longer than those already there

I have a Pandas DataFrame stored via an HDFStore that essentially stores summary rows about test runs I am doing. Several of the fields in each row contain descriptive strings of variable length. When I do a test run, I create a new DataFrame with a…
ultra909
  • 1,740
  • 1
  • 23
  • 25
10
votes
2 answers

Is there a way to get a numpy-style view to a slice of an array stored in a hdf5 file?

I have to work on large 3D cubes of data. I want to store them in HDF5 files (using h5py or maybe pytables). I often want to perform analysis on just a section of these cubes. This section is too large to hold in memory. I would like to have a numpy…
Caleb
  • 3,839
  • 7
  • 26
  • 35
10
votes
1 answer

Indexing and Data Columns in Pandas/PyTables

http://pandas.pydata.org/pandas-docs/stable/io.html#indexing I'm really confused about this concept of Data columns in Pandas HDF5 IO. Plus there's very little to no information about it to be found on googling it either. Since I'm diving into…
user1265125
  • 2,608
  • 8
  • 42
  • 65
10
votes
3 answers

PyTables dealing with data with size many times larger than size of memory

I'm trying to understand how PyTables manage data which size is greater than memory size. Here is comment in code of PyTables (link to GitHub): # Nodes referenced by a variable are kept in `_aliveNodes`. # When they are no longer referenced, they…
Gill Bates
  • 14,330
  • 23
  • 70
  • 138
9
votes
1 answer

How should python dictionaries be stored in pytables?

pytables doesn't natively support python dictionaries. The way I've approached it is to make a data structure of the form: tables_dict = { 'key' : tables.StringCol(itemsize=40), 'value' : tables.Int32Col(), } (note that I ensure that…
tdc
  • 8,219
  • 11
  • 41
  • 63
9
votes
1 answer

Using pytables, which is more efficient: scipy.sparse or numpy dense matrix?

When using pytables, there's no support (as far as I can tell) for the scipy.sparse matrix formats, so to store a matrix I have to do some conversion, e.g. def store_sparse_matrix(self): grp1 = self.getFileHandle().createGroup(self.getGroup(),…
tdc
  • 8,219
  • 11
  • 41
  • 63
9
votes
1 answer

Concatenate two big pandas.HDFStore HDF5 files

This question is somehow related to "Concatenate a large number of HDF5 files". I have several huge HDF5 files (~20GB compressed), which could not fit the RAM. Each of them stores several pandas.DataFrames of identical format and with indexes that…
Vladimir
  • 1,363
  • 2
  • 14
  • 28
9
votes
1 answer

Pandas as fast data storage for Flask application

I'm impressed by the speed of running transformations, loading data and ease of use of Pandas and want to leverage all these nice properties (amongst others) to model some large-ish data sets (~100-200k rows, <20 columns). The aim is to work with…
orange
  • 7,755
  • 14
  • 75
  • 139
8
votes
6 answers

Pytables vs. CSV for files that are not very large

I recently came across Pytables and find it to be very cool. It is clear that they are superior to a csv format for very large data sets. I am running some simulations using python. The output is not so large, say 200 columns and 2000 rows. If…
Curious2learn
  • 31,692
  • 43
  • 108
  • 125
8
votes
1 answer

HDF5 file grows in size after overwriting the pandas dataframe

I'm trying to overwrite the pandas dataframe in hdf5 file. Each time I do this, the file size grows up while the stored frame content is the same. If I use mode='w' I lost all other records. Is this a bug or am I missing something? import pandas df…
8
votes
1 answer

Store and extract numpy datetimes in PyTables

I want to store numpy datetime64 data in a PyTables Table. I want to do this without using Pandas. What I've tried so far Setup In [1]: import tables as tb In [2]: import numpy as np In [3]: from datetime import datetime create data In [4]: data =…
MRocklin
  • 55,641
  • 23
  • 163
  • 235
8
votes
1 answer

PyTables read random subset

Is it possible to read a random subset of rows from HDF5 (via pyTables or, preferably pandas)? I have a very large dataset with million of rows, but only need a sample of few thousands for analysis. And what about reading from compressed HDF file?
Marigold
  • 1,619
  • 1
  • 15
  • 17
8
votes
2 answers

Large matrix multiplication in Python - what is the best option?

I have two boolean sparse square matrices of c. 80,000 x 80,000 generated from 12BM of data (and am likely to have orders of magnitude larger matrices when I use GBs of data). I want to multiply them (which produces a triangular matrix - however I…
user7289
  • 32,560
  • 28
  • 71
  • 88
1 2
3
41 42