Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
0
votes
1 answer

Specifying output directory in vaex.from_csv()

I am using Python's Vaex library in a Kaggle notebook to convert a .csv dataset to .hdf5 using the vaex.from_csv() method. I am unable to find a way to specify the output directory for the hdf5 file. The method creates the file in the same directory…
0
votes
1 answer

Vaex join two datasets and filter

I would like to perform 2 operations on vaex dataframes: I have two vaex datasets: vaex_cpc having 159,541,409 observations and vaex_id.info with 117,081,595 observations. They both share a column called "docdb_family_id" and I would like to merge…
Lusian
  • 629
  • 1
  • 5
  • 11
0
votes
1 answer

Vaex: The process cannot access the file because it is being used by another process

I am working on an application that uses Vaex for accessing data from a feather file. We are creating virtual columns in a dataframe that store Boolean values which are used to filter rows of data in the dataset. Every time a new filter is made a…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
0
votes
0 answers

Vaex: AttributeError: module 'vaex' has no attribute 'from_pandas'

I'm facing a following error when run the script in a Linux environment. I appreciate anyone who could help to fix this issue. Vaex: AttributeError: module 'vaex' has no attribute 'from_pandas' on following environment. I haven't encountered same…
mh0189
  • 21
  • 3
0
votes
1 answer

Unexpected Output in Vaex Function

I have the following Vaex function I am trying to make: @vaex.register_function(on_expression=True) def getSumStatsByGroup(df, group, x): data = (df.groupby(by=group, agg={'Min' : vaex.agg.min(df[x]), 'Mean' : vaex.agg.mean(df[x]), 'Max' :…
rochimer
  • 87
  • 8
0
votes
0 answers

Excel file loading in vaex

Is it possible to read excel directly in vaex like it do for csv? We could find from_csv function for loading csv but couldn't find a method for excel.
Vivek Menon M
  • 64
  • 1
  • 1
  • 8
0
votes
1 answer

Vaex expression to select all rows

In Vaex, what expression can be used as a filter to select all rows? I wish to create a filter as a variable and pass that to a function. filter = True if x > 5: filter = y > 20 df['new_col'] = filter & z < 10 My wish is that if x <= 5 it will…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
0
votes
1 answer

Vaex copy columns between dataframes

I have a dataframe that I performed a filter on and then added some virtual columns. I wish to add those columns back in with the original data frame. Here is my code. original_df = ... df = original_df.filter(f"my_col_{id}") df["new_col"] =…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
0
votes
2 answers

How to add new columns to vaex dataframe? Type Error

How to add new columns to vaex dataframe? I received the type error when I try to assign a list object to the dataframe, as is done in pandas, but received following error: ValueError: [1, 1, 1, 1, 1, 1, 1] is not of string or Expression type, but…
0
votes
0 answers

Big data scatterplot adding lines

I need a scatterplot for a dataset with 77M+ rows, plus adding lines like the plt.axlines. import pandas as pd import numpy as np import matplotlib.pyplot as plt df = pd.DataFrame({ 'x' :np.random.normal(0,1,77000000), 'y' :…
VYago
  • 325
  • 2
  • 9
0
votes
1 answer

How to handle date time formats using VAEX?

I am new to VAEX. Also I couldn't find any solution for my specific question in google. So I am asking here hoping someone can solve my issue :). I am using VAEX to import data from CSV file in my DASH Plotly app and then want to convert Date column…
0
votes
1 answer

How to load data from a connection string with vaex package?

If I have a table on my server and I am producing a connection string to it, how can I, using Vaex, load it to a dataframe? Here is what I am doing but with Pandas: from sqlalchemy import types, create_engine, text import pandas as pd import…
SteveS
  • 3,789
  • 5
  • 30
  • 64
0
votes
1 answer

Is there an equivalent of `to_json` for Vaex dataframes?

I'm currently working on a Dash app to visualize large amounts of data. With scalability issues in mind, I'm trying to migrate from Pandas to the Vaex library to lazily load data and optimize recurrent scanning of the dataset (each time the user…
junsuzuki
  • 100
  • 7
0
votes
1 answer

CatBoostError: catboost/libs/model/model.cpp:1716: Approx dimensions don't match: 92 != 89

I use the CatBoostModel by vaex. transactions_sample_merged is a 10000x10 DataFrame. Int64Index: 10000 entries, 0 to 9999 Data columns (total 10 columns): # Column Non-Null Count Dtype …
Irvin
  • 27
  • 4
0
votes
3 answers

Extract dictionary value from column in data frame with Vaex

I applied on my dataframe the next command df['date_article'] = df.pagePath.str.extract_regex(pattern='(?P/\d{4}/\d{2}/\d{2}/)') And this created the column…