Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
1
vote
0 answers

VAEX MemoryError when reading CSV and converting to HDF5

I am trying to import a 30 Gb csv file and convert it to HDF5 through vaex with the following code. I read that setting convert to true would prevent an OutOfMemory error, although I continue to get the error after nearly 30 minutes of trying to…
rochimer
  • 87
  • 8
1
vote
1 answer

How to filter a vaex dataset by a list of numbers/categories

As an example, I have the next dataset (fake random data) - Index category value 1 dog 5 2 cat 22 3 Tasselled Wobbegong 44 4 cat 66 5 Tasselled Wobbegong 5 6 dog 23 I have this in a vaex dataframe. Now imagine I have 10,000…
Matan
  • 73
  • 2
  • 14
1
vote
2 answers

Vaex: Is there way to split single column into multiple columns

I have been trying to find a way to split a text data(separator is space) in a single column into multiple columns. i can do it by Pandas using the following code, but i would like to do the same with Vaex. i was looking at the Vaex API document,…
mh0189
  • 21
  • 3
1
vote
1 answer

reordering columns in vaex?

my question is how do I reorder columns in vaex. for example, I want the 5th column at number 1 and the first column at number 5, etc. I know we can use the reindex method in pandas, is there a way to mimic that in vaex. thanks for your help.
Hasham
  • 43
  • 3
1
vote
1 answer

Calculate log for a column with zeros in vaex

I am looking to calculate the log of one of my columns in Vaex. The problem is that some of the rows in my column have the value of 0. The following works if the column doesn't contain zero: df['log_axis'] = np.log(df['original_axis']) I have tried…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
1
vote
2 answers

Unable to install vaex in anaconda

I tried to import vaex in my company computer, but the proxy is blocking the pip install in the jupyter notebook. Is there a alternative to install due to proxy restriction? I also tryied to install it in the command line, but I am getting this…
1
vote
1 answer

vaex apply does not work when using dataframe columns

I am trying to tokenize natural language for the first sentence in wikipedia in order to find 'is a' patterns. n-grams of the tokens and left over text would be the next step. "Wellington is a town in the UK." becomes "town is a attr_root in the…
Superdooperhero
  • 7,584
  • 19
  • 83
  • 138
1
vote
1 answer

vaex filter and bin a dataframe using mask from anther series

I have a large arrow file with 14 million rows. In my app I select two columns and bin them using the count/binby functionality in Vaex. df.count( binby=axes, limits=limits, shape=(binnum,)*len(axes), delay=True ) Some of my columns act as…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
1
vote
1 answer

How do I efficiently calculate the mean of nested subsets of Vaex dataframes?

I have a very large dataset comprised of data for several dozen samples, and several hundred subsamples within each sample. I need to get mean, standard deviation, confidence intervals, etc. However, im running into a (suspected) massive performance…
1
vote
1 answer

Loading vaex dataframe to dash datatable

I am trying to load a dataframe(vaex) to dash datatable and getting following error. Invalid argument data passed into DataTable with ID "table". Expected an array. Was supplied type object. Tried the following, is it possible to load vaex dataframe…
AVM
  • 592
  • 2
  • 11
  • 25
1
vote
1 answer

Inner Join hdf5 dataframe vaex python

I need to compare two csv and do inner join .I am using vaex which is faster than pandas but got stuck after a point. my code was working with pandas but it was slow .How can I inner join two hdf5 type files and get the output in csv . My code …
alok sharma
  • 35
  • 1
  • 7
1
vote
1 answer

Loading Dataframe from Parquet and calculating max explodes in RAM

I am new to Dask and exported a pandas Dataframe to Parquet with row groups: x.to_parquet(path + 'ohlcv_TRX-PERP_978627_rowgrouped.prq', row_group_size=1000) Then I tried to load it with Dask, which seems to work correctly(?): x =…
1
vote
1 answer

What is the Vaex command for pd.isnull().sum()?

Someone please give me a VAEX alternative for this code: df_train = vaex.open('../input/ms-malware-hdf5/train.csv.hdf5') total = df_train.isnull().sum().sort_values(ascending = False)
1
vote
0 answers

vaex: What's the equivalent to pandas first() or last()

In Pandas I'd do this: df.groupby('key').first() What's the equivalent in vaex? Is it possible to do something similar?
IsaacLevon
  • 2,260
  • 4
  • 41
  • 83
1
vote
1 answer

How to calculate the maximum of several columns through Vaex?

I want calculate the maximum (axis=1) of several columns in a very large dataset efficiently, while the code I use now is: df["ia_timestamp"] = df[labels].values.max(axis=1). Here df is the DataFrame in Vaex. I think the step taking "values"…
Channing
  • 11
  • 1