Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
1
vote
1 answer

How can I efficiently remove non-finite values from a Vaex DataFrame with many columns?

My data has values that are equal to positive and negative infinity. Vaex has functions to dropna, dropmissing and dropnan but not for removing non-finite values. My current approach is to iterate through each column of interest and overwrite…
Joe
  • 418
  • 4
  • 12
1
vote
1 answer

vaex column name change

Hi I'm just getting started with Vaex in Python. I have a dataset with messy column names. I'm trying to replace spaces with '_'. In pandas I'm able to df.column = df.columns.str.replace(' ', '_') but in Vaex df_column =…
mqn
  • 53
  • 2
  • 5
1
vote
1 answer

Initialize Vaex Dataframe Column to a value

I want to initialize a column of my vaex dataframe to the int value 0 I have the following: right_csv = "animal_data.csv" vaex_df = vaex.open(right_csv,dtype='object',convert=True) vaex_df["initial_color"] = 0 But this will throw an error for…
DNS_Jeezus
  • 289
  • 4
  • 17
1
vote
1 answer

Bridging exports and imports between dask and vaex

I am working jointly with vaex and dask for some analysis. In the first part of the analysis I do some processing with dask.dataframe, and my intention is to export the dataframe I computed into something vaex reads. I want to export the data into a…
1
vote
1 answer

Vaex Datetime comparison

I have a vaex dataframe that reads from a hdf5 file. It has a date column which is read as string. I converted it into datetime. However, I am not able to do any date comparisons. I can extract day,month,year, etc from the date so the conversion is…
sak
  • 111
  • 1
  • 11
1
vote
1 answer

vaex: Check for equality between two frames

Does vaex have any utility functions that help with checking for equality between two dataframes? For example: pandas has pandas.testing.assert_frame_equal to check if two frames hold the same columns and values, which is rather nice when writing…
sobek
  • 1,386
  • 10
  • 28
1
vote
1 answer

vaex binary installation in windows

Installing Python packages can be as frustrating as it can be. Maybe I am the only poor pathetic who still stucks in Windows and the world is living their world happily in Mac and Linux I am trying to install vaex in my venv environment in Windows.…
chapter3
  • 914
  • 2
  • 12
  • 21
1
vote
1 answer

How do I troubleshoot ValueError: array is of length %s, while the length of the DataFrame is %s?

I'm trying to follow the example on this notebook. As suggested in this github thread: I've upped the ulimit to 9999. I've already converted the csv files to hdf5 My code fails when trying to open a single hdf5 file into a dataframe: df =…
1
vote
2 answers

Python vaex how to create dataframe from a CSV file?

Why do I only get the last column if __name__ == '__main__': # win远程linux运行 import vaex,pandas as pd df_pd = pd.read_csv('./a.csv') # contains 4 columns print(df_pd) print(list(df_pd.columns)) df = vaex.from_pandas(df_pd) # only last column #…
1
vote
3 answers

Pandas Filtering and convert to Date to datetime64ns

I am trying to figure out a problem but so far I could not find any solution I hope you might can help. I have a DataFrame and I would like to convert str to datatime but there are some invalid rows which I would like to filter out. Here are two…
1
vote
2 answers

Jupyter Pandas - dropping items which have average over a threshold

I have a data frame with items and their prices, something like this: ╔══════╦═════╦═══════╗ ║ Item ║ Day ║ Price ║ ╠══════╬═════╬═══════╣ ║ A ║ 1 ║ 10 ║ ║ B ║ 1 ║ 20 ║ ║ C ║ 1 ║ 30 ║ ║ D ║ 1 ║ 40 ║ ║ A ║ 2 ║ …
dalayr
  • 13
  • 4
0
votes
0 answers

Import HDFS data using Vaex

So im looking for alternatives to access huge volume of data from HDFS beside spark and i found vaex. Is there anyway to directly access data from HDFS using vaex? can i have some example line that you guys found? Thanks
0
votes
0 answers

vx.from_pandas(df).export_hdf5(path) giving KeyError while writing pandas df to HDF5 file

Software information Vaex version : 4.1.0 Vaex was installed via: pip from source OS: Windows Server 2016 Python Version : Python 3.9.7 Additional information Backend application while trying save the pandas df to HDF5 file using code…
0
votes
0 answers

vaex create a unique dataframe using a dupplicated dataframe

i'm using vaex library. input dataframe has dupplicate rows. i need to create new dataframe that can uniquly identify the group name and group hash like below output input dataframe group_name, group_hash student, xxxx12313 student,…
WAEX
  • 115
  • 1
  • 9
0
votes
0 answers

refactoring code from pandas into vaex | loc was usefull in pandas howerver cannot be used in vaex

I am struggling getting my code works, which was writtin in pandas and now i am refactoring it using vaex as howerver loc() isn't exist in vaex. Could anyone please help me in this! Idea: Aim to replace the missing values in the start_time column by…