Questions tagged [pytables]

A Python library for working with extremely large hierarchical (HDF5) datasets.

PyTables is a package for managing hierarchical (HDF5) datasets and designed to efficiently and easily cope with extremely large amounts of data. PyTables is available as a free download.

PyTables is built on top of the HDF5 library, using the Python language and the NumPy package. It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code (generated using Cython), makes it a fast, yet extremely easy to use tool for interactively browse, process and search very large amounts of data.

Links to get started:
- Documentation
- Tutorials
- Library Reference
- Downloads

617 questions
6
votes
2 answers

Why does pandas convert unsigned int greater than 2**63-1 to objects?

When I convert a numpy array to a pandas data frame pandas changes uint64 types to object types if the integer is greater than 2^63 - 1. import pandas as pd import numpy as np x = np.array([('foo', 2 ** 63)], dtype = np.dtype([('string', np.str_,…
jamin
  • 63
  • 1
  • 4
6
votes
1 answer

How to install pytables 3.2 on anaconda?

I use anaconda, and I cannot upgrade with conda update pytables it says "already installed". .... # All requested packages already installed. # packages in environment at C:\Anaconda: # pytables 3.1.1 …
Ewan
  • 415
  • 1
  • 4
  • 13
6
votes
3 answers

How to deal with pandas column that has a list of dicts in every cell

I have a DataFrame that includes a column where every cell is made up of a list of dicts, and each list of dicts is of varying length (including 0). An example: df = pd.DataFrame({'ID' : [13423,294847,322844,429847], 'RANKS': [[{u'name': u'A',…
James
  • 113
  • 1
  • 9
6
votes
1 answer

Pip does not acknowledge Cython

I just installed pip and Python via home-brew on a fresh Mac OS installation. First of all, my pip is not installing dependencies at all - which forces me to re-run 'pip install tables' 3 times and every time it will tell me a dependency and I will…
FooBar
  • 15,724
  • 19
  • 82
  • 171
6
votes
1 answer

Appending to HDFStore fails with "cannot match existing table structure"

The final solution was to use the "converters" parameter of read_csv and check every value before adding it to the DataFrame. In the end there were only 2 broken values in over 80GB of raw data. The parameter looks like this: converters={'XXXXX':…
FrozenSUSHI
  • 334
  • 4
  • 10
6
votes
1 answer

Renaming a table in pandas hdfstore

I am using pandas to join several huge csv files using HDFStore. I'm merging all the other tables to a base table, base. Right now I create a new table in the HDFStore for the output of each merge, which I call temp. Then I delete the old base…
Luke
  • 6,699
  • 13
  • 50
  • 88
6
votes
1 answer

Check if key is in HDF5Store without path

Using pandas/pytables, a list of keys can be easily returned using store.keys(). >>> store.keys() ['/df_coord', '/metaFrame'] Using the standard dictionary check to see if a key exists, if 'df_coord' in store.keys():, returns false unless the / is…
riddley_w
  • 249
  • 1
  • 4
  • 7
6
votes
2 answers

How do you create a compressed dataset in pytables that can store a Unicode string?

I'm using PyTables to store a data array, which works fine; along with it I need to store a moderately large (50K-100K) Unicode string containing JSON data, and I'd like to compress it. How can I do this in PyTables? It's been a long time since I've…
Jason S
  • 184,598
  • 164
  • 608
  • 970
6
votes
1 answer

compressing array with pytables

I am trying to compress my array like this import numpy as np import tables from contextlib import closing FILTERS = tables.Filters(complib='zlib', complevel=5) data = np.zeros(10**7) with closing(tables.open_file('compressed', mode='w',…
qweqwegod
  • 75
  • 1
  • 5
6
votes
1 answer

Pandas HDFStore unload dataframe from memory

OK I am experimenting with pandas to load around a 30GB csv file with 40 million+ rows and 150+ columns in to HDFStore. The majority of the columns are strings, followed by numerical and dates. I have never really used numpy, pandas or pytables…
smartexpert
  • 2,625
  • 3
  • 24
  • 41
6
votes
2 answers

ptrepack sortby needs 'full' index

I am trying to ptrepack a HDF file that was created with pandas HDFStore pytables interface. The main index of the dataframe was time but I made some more columns data_columns so that I can filter for data on-disk via these data_columns. Now I would…
K.-Michael Aye
  • 5,465
  • 6
  • 44
  • 56
6
votes
2 answers

What is a better approach of storing and querying a big dataset of meteorological data

I am looking for a convenient way to store and to query huge amount of meteorological data (few TB). More information about the type of data in the middle of the question. Previously I was looking in the direction of MongoDB (I was using it for many…
Salvador Dali
  • 214,103
  • 147
  • 703
  • 753
6
votes
1 answer

Append data to existing pytables table

I am new to PyTables and implemented a few basic techniques of inserting and retrieving data from a table in Pytables. However, I am not sure about how to insert data in an existing table of PyTables because all I read/get in the tutorial is…
khan
  • 7,005
  • 15
  • 48
  • 70
6
votes
1 answer

HDFStore: table.select and RAM usage

I am trying to select random rows from a HDFStore table of about 1 GB. RAM usage explodes when I ask for about 50 random rows. I am using pandas 0-11-dev, python 2.7, linux64. In this first case the RAM usage fits the size of chunk with…
user17375
  • 529
  • 4
  • 14
6
votes
2 answers

Pytables table into pandas DataFrame

Lots of information on how to read a csv into a pandas dataframe, but I what I have is a pyTable table and want a pandas DataFrame. I've found how to store my pandas DataFrame to pytables... then read I want to read it back, at this point it will…
Jim Knoll
  • 115
  • 2
  • 6