Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
1
vote
0 answers

What is the right way to run python vaex.ml.catboost.CatBoostModel.fit in parallel for several folds?

Description I have a python code that sequentially calls vaex.ml.catboost.CatBoostModel.fit for 3 folds. It takes a lot of time, I would like to run vaex.ml.catboost.CatBoostModel.fit in parallel. Problem I get different results when I run…
Capacytron
  • 3,425
  • 6
  • 47
  • 80
1
vote
1 answer

How to iterate with result of previous rows of same column?

Starting from a Data Frame with the columns A B D P: import numba import numpy as np import pandas as pd import vaex d = {'A':[0,1,2,3,4,5,6],'B':[30,35,32,35,31,39,37],'D':[12,10,13,19,12,21,13],'P':[3,3,3,3,3,3,3]} df =…
1
vote
1 answer

Pandas how to bin and groupby without categorical range of values

I have a large number of latitude and longitude values that I would like to bin together in order to display them on a heatmap (ipyleaflet only seems to allow 2000 or so points in the heatmap and this would also be much more efficient when using big…
Superdooperhero
  • 7,584
  • 19
  • 83
  • 138
1
vote
2 answers

Oracle SQL: How to best go about counting how many values were in time intervals? Database query vs. pandas (or more efficient libraries)?

I currently have to wrap my head around programming the following task. Situation: suppose we have one column where we have time data (Year-Month-Day Hours-Minutes). Our program shall get the input (weekday, starttime, endtime, timeslot) and we want…
kallikles
  • 83
  • 1
  • 2
  • 6
1
vote
1 answer

Memory error when merging two big dataframes

I could use some help. The main problem is to calculate the distance between two points with their latitude and longitude. We have divided Brazil into 33k hexagons, listed in the dataframe below: I've been trying to merge this dataframe with its…
1
vote
1 answer

interactive large plot with vaex

I am using python 3.8 on Windows 10; trying to make a plot with about 700M points in it, sound wave analysis. Here: Interactive large plot with ~20 million sample points and gigabytes of data Vaex was highly recommended. I am trying to use examples…
1
vote
4 answers

Create Dataframe in Pandas - Out of memory error while reading Parquet files

I have a Windows 10 machine with 8 GB RAM and 5 cores. I have created a parquet file compressed with gzip. The size of the file after compression is 137 MB. When I am trying to read the parquet file through Pandas, dask and vaex, I am getting memory…
Rishim Mittal
  • 105
  • 2
  • 7
1
vote
0 answers

vaex ValueError: Could not find a class (AggSum_object), seems object is not supported

I got the following error while doing aggregation dfv = vaex.from_csv(_path + 'sample.csv') _monetary = dfv.groupby('CusUnique',agg=vaex.agg.sum('Trn_AMT')) which returns. "ValueError: Could not find a class (AggSum_object), seems object is not…
1
vote
1 answer

Duplicate rows and change column value (python vaex)

I have this dataframe dataframe I would like to duplicate all the rows that (day_of_year == 140) and these duplicate rows replace the day_of_year column with 148. That is, duplicate the rows and at the same time replace the day_of_year column and…
1
vote
1 answer

vaex filter an dataframe using mask from anther series

I want to use a mask from series x to filter out a vaex dataframe y. I know how to do this in pandas and numpy. In pandas it's like: import pandas as pd a = [0,0,0,1,1,1,0,0,0] b = [4,5,7,8,9,9,0,6,4] x = pd.Series(a) y =…
Asuralm
  • 85
  • 9
1
vote
1 answer

Altair with Vaex

I am trying to use Vaex together with Altair but I am having some troubles passing Vaex dataframes to Altair. When trying to make a simple line chart alt.Chart(df)\ .mark_line()\ .encode(alt.X('x'), alt.Y('y1')) I get an error saying that [the]…
shamalaia
  • 2,282
  • 3
  • 23
  • 35
1
vote
1 answer

Renaming the columns in Vaex

I tried to read a csv file of 4GB initially with pandas pd.read_csv but my system is running out of memory (I guess) and the kernel is restarting or the system hangs. So, I tried using vaex library to convert csv to HDF5 and do…
1
vote
1 answer

How to use named selection for filtering in Vaex

I created 2 named selections df.select(df.x => 2,name='bigger') df.select(df.x < 2,name='smaller') and it's cool, I can use the selection parameter so many (ie statistical) functions offer, for example df.count('*',selection='bigger') but is there…
mojzis
  • 324
  • 1
  • 2
  • 13
1
vote
0 answers

50 million records from the oracle database to vaex using from_pandas

The code below is from the vaex documentation: pandas_df = pd.read_sql_query('SELECT * FROM MYTABLE', con=engine) df = vaex.from_pandas(pandas_df, copy_index=False) Description I have data more than RAM. But, when I use above code, it try and pull…
1
vote
1 answer

Vaex column doesn't evaluate

I have the following calculation: df.t =100.0*((1.25/1023)*df.t-0.5) Strangely, >>>df doesn't show result, only old values in that column. However, df.t shows calculated values. So, when I export result to pandas with dfp = df.to_pandas_df(), it…