Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
0
votes
1 answer

About python vaex merging columns to a new column while changing int to float

I am able to write a function to merge columns to a new column, but fail to change int column into float before changing to string for merging. I hope that in the new merged column, those integer would have pending ".00000". At the end I was trying…
Henry
  • 57
  • 6
0
votes
0 answers

Python Vaex read txt file parse it and append it

I am dealing with text files containing space separated values File1.txt: text1 text2 text3 text4 ... I would like to read the file into a vaex df and concatenate it to a so called TOTALDF vaex df. I want to use vaex and not pandas because the df is…
JFerro
  • 3,203
  • 7
  • 35
  • 88
0
votes
1 answer

python: How concatenate pandas dataframes with VAEX

I would like to join thousands of dataframes into one VAEX dataframe Following the documentation I have: https://vaex.readthedocs.io/en/latest/api.html?highlight=concat#vaex.concat I do: df_vaex = vaex.DataFrame() for i,file in enumerate(files): …
JFerro
  • 3,203
  • 7
  • 35
  • 88
0
votes
1 answer

How do you check if all the values in a column in a dataframe exist in another column in another dataframe using Vaex?

I have a dataframe with 160,000 rows and I need to know if these values exist in another column in another different dataframe that has over 7 million rows using Vaex. I have tried doing this in pandas but it takes way too long to run. Once I run…
0
votes
1 answer

How to read tsv file from vaex and output a pyarrow parquet file?

On these vaex and pyarrow version: >>> vaex.__version__ {'vaex': '4.12.0', 'vaex-core': '4.12.0', 'vaex-viz': '0.5.3', 'vaex-hdf5': '0.12.3', 'vaex-server': '0.8.1', 'vaex-astro': '0.9.1', 'vaex-jupyter': '0.8.0', 'vaex-ml': '0.18.0'} >>>…
alvas
  • 115,346
  • 109
  • 446
  • 738
0
votes
1 answer

How to merge HDF files created by Vaex

I have thousands of HDF files which is around 30GB in total. The HDF files are created though Vaex, but the name and number of columns of each file are not the same. I want combine them to a single HDF file and the same dataframe, is there any…
Terry
  • 1
0
votes
0 answers

Convert timestamp to datetime for a Vaex dataframe

I have a parquet file that I have loaded as a Vaex dataframe. The parquette file has a column for a timestamp in the format 2022-10-12 17:10:00+00:00. When I try to do any kind of analysis with my dataframe I get the following error. KeyError:…
Rbc.F
  • 55
  • 4
0
votes
2 answers

Convert huge csv to hdf5 format

I downloaded IBM's Airline Reporting Carrier On-Time Performance Dataset; the uncompressed CSV is 84 GB. I want to run an analysis, similar to Flying high with Vaex, with the vaex libary. I tried to convert the CSV to a hdf5 file, to make it…
saibot_90
  • 39
  • 6
0
votes
0 answers

Is vaex.graphql operation NOT 'lazy' evaluated?

I am trying to run this vaex program to perform a graphql query. I believe the operation is not lazily-evaluated. I could confirm that the memory consumption for my python program increases continuously. import vaex import time from vaex.graphql…
0
votes
0 answers

RuntimeError: stride is not equal to 1 in unit tests

I am building an app using vaex v4.9.1 and python 3.9. I use limits function to get the min and max combos for two axes like so: limits = df.limits(list(axes_val.values()), delay=False, selection=True) Just recently randomly all of my unit…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
0
votes
1 answer

Vaex Dataframe - Groupby on a calculated field - throws error

I have the referenced vaex dataframe The column "Amount_INR" is calculated using the other three attributes using the function: def convert_curr(x,y,z): c = CurrencyRates() return c.convert(x, 'INR', y, z) data_df_usd['Amount_INR'] =…
RameJ
  • 1
  • 1
0
votes
1 answer

Vaex TypeError: expected string or bytes-like object

I'm getting a TypeError: expected string or bytes-like object when I'm processing this dataset using Vaex python library. I've written the following code: import pyarrow as pa import vaex import re # Reading Data anime =…
0
votes
1 answer

Calculation on every dataset entry in Vaex

I wish to transform every column in a dataset so its entries are between 0 and 1 based on the min/max of a column. I get the min/max of each column with df.minmax(col_names) and then want to find the column width col_width = col_max - col_min. With…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
0
votes
1 answer

How to calculate the max row value for each column through Vaex

I have an application that uses a Pandas dataframe to calculate each min/max row value for each column. For example: col_a col_b col_c 2 8 7 10 4 3 6 5 1 calling df.max() produces col_a 10 col_b 8 col_c …
afriedman111
  • 1,925
  • 4
  • 25
  • 42
0
votes
2 answers

What did the HDF5 format do to the csv file?

I had a csv file of 33GB but after converting to HDF5 format the file size drastically reduced to around 1.4GB. I used vaex library to read my dataset and then converted this vaex dataframe to pandas dataframe. This conversion of vaex dataframe to…
a.ydv
  • 31
  • 5