Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
0
votes
0 answers

How can Pandas read_csv() be forced not to stack rows with identical datatypes into an array?

I've converted a larger csv file to HDF5 format using Pandas read_csv(). When loading it back in for in-kernel queries using PyTables, I see the data structure has changed from a string and 4 float fields to "index": Int64Col(shape=(), dflt=0,…
Pebermynte Lars
  • 414
  • 3
  • 14
0
votes
0 answers

Add some fields from PyTable to Pandas dataFrame

I have a Pytable, with 12 columns. I would like to create a Pandas dataFrame including only some of them, for processing. The PyTable is quite long, it has around 8-10 millions of rows. I have tried so, but for some reason the process seems to be…
codeKiller
  • 5,493
  • 17
  • 60
  • 115
0
votes
0 answers

what is PyTables external sorting algorithm?

I need to know which sorting algorithm PyTables is using to create completely sorted indices. The code in Github has reference to both mergesort and quicksort but I'm still clueless on how they're actually being used. Any thoughts?
diogoaos
  • 400
  • 1
  • 14
0
votes
1 answer

argsort on a PyTables' array

I have a problem with NumPy's argsort. It creates an int64 array of the length of the input array in-memory. Since I'm working with very large arrays, this will blow the memory. I tested NumPy's argsort with a small PyTables' carray and it gives the…
diogoaos
  • 400
  • 1
  • 14
0
votes
1 answer

How do I truncate an EARRAY in an HDF5 file using pytables?

I have an HDF5 file containing a very large EARRAY that I would like to truncate in order to save disk space and process it more quickly. I am using the truncate method on the node containing the EARRAY. pytables reports that the array has been…
cxrodgers
  • 4,317
  • 2
  • 23
  • 29
0
votes
0 answers

Python Large Data Set - PyTable HDF5 with Dataframe

I have a very large file saved as abn hdf file. I also have some data in a python dataframe that I would like to look up in the hdf file by an unique id. Is there an easy way to basically join them? I saw the example for the where function but it…
Craig
  • 55
  • 3
0
votes
1 answer

Can't save pandas dataframe to HDF file

I have been trying for a while to save a pandas dataframe to an HDF5 file. I tried various different phrasings eg. df.to_hdf etc. but to no avail. I am running this in a python virtual environment see here. Even without the use of the VE it has the…
Jacques MALAPRADE
  • 963
  • 4
  • 13
  • 28
0
votes
0 answers

Error importing tables in Python in Windows 7 64-bit

I'm trying to use the HDFStore command in Pandas and although I have PyTables installed, when I run the command: import pandas as pan filename=('testfile.h5') store=pan.HDFStore(filename) I get the following error: Traceback (most recent call…
0
votes
0 answers

pytables, how to sort rows appended

In pytables (aka tables) you can append data according to the tutorial, and docs. How can I append unsorted data on the fly continuously received from a server to an existing sorted table? In other words, I need to append rows not only to the end…
Bad
  • 4,967
  • 4
  • 34
  • 50
0
votes
1 answer

Accessing carray of pointcloud using pytables

I am having a hard time understanding how to access the data in a carray. http://carray.pytables.org/docs/manual/index.html I have a carray that I can view in a group structure using vitables - but how to open it and retrieve the data it beyond me.…
ashley
  • 1,535
  • 1
  • 14
  • 19
0
votes
2 answers

How to install PyTables 2.3.1 with Anaconda, missing HDF5 library

I need to run an older verion of PyTables, that is 2.3.1, in and Anaconda environment on Linux. But I cannot install it. conda install -n myenv pytables=2.3.1 fails finding the appropriate version. conda install -n myenv pytables=2 installs…
SmCaterpillar
  • 6,683
  • 7
  • 42
  • 70
0
votes
1 answer

to make pydata handle string columns

I have a dataframe that has a few columns with floats and a few columns that are string. All columns have nan. The string columns have either strings or nan which appear to have a type float. When I try to 'df.to_hdf' to store the dataframe, I get…
uday
  • 6,453
  • 13
  • 56
  • 94
0
votes
1 answer

How can I store an array or list of Strings in a PyTable?

For example, I have the following table description. In SpectrumL, I would like to store a (spectrogram) I do not know the exact size yet. Similarly, I would like to store some tags (which will be strings) and their size will vary by record. …
mle
  • 289
  • 1
  • 2
  • 12
0
votes
1 answer

How do I initialize a PyTables table column size?

I am doing a Monte Carlo calculation and I'd like to save the intermediate results to disk. Below is a basic version of my code. In my original version, I had a data aggregator object that would collect the results from each trajectory and then at…
craigim
  • 3,884
  • 1
  • 22
  • 42
0
votes
1 answer

How to define column names using 'for loop' inside a class (IsDescription)

We know that if we need to define the column names of a table using pytables we can do it by the following way: class Project(IsDescription): alpha = StringCol(20) beta = StringCol(20) gamma = StringCol(20) where alpha, beta and gamma…
Zaman
  • 37
  • 8