Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
3
votes
2 answers

importing large CSV file using Dask

I am importing a very large csv file ~680GB using Dask, however, the output is not what I expect. My aim is to select only some columns (6/50), and perhaps filter them (this I am unsure of because there seems to be no data?): import dask.dataframe…
Stackbeans
  • 273
  • 1
  • 16
3
votes
1 answer

Reading a Parquet file using Vaex

I'm trying to read some data into python from a Parquet file, using Vaex. This is the output I get using the vaex.open function. >>> import vaex >>> trade = vaex.open('trade.parquet') >>> trade Traceback (most recent call last): File "",…
mturkington
  • 119
  • 1
  • 5
3
votes
1 answer

How to add new column from array to Vaex dataframe after filtered?

I have data file 'for-filter.txt' a,b,c,d 1,2,3,4 2,6,7,8 -1,2,3,4 4,5,5,5 -2,3,3,3 Vaex code that I am doing import vaex as vx import numpy as np df_vaex = vx.from_csv('for-filter.txt') df_filter = df_vaex[df_vaex['a'] >…
Haha TTpro
  • 5,137
  • 6
  • 45
  • 71
3
votes
1 answer

Apply custom function to groupby in vaex

I want to apply some custom logic to each individual group obtained by groupby. It is easy to do so in pandas. How to apply some custom function to groups created by groupby in vaex? For example, suppose I want to find the min index and max index of…
MSS
  • 3,306
  • 1
  • 19
  • 50
3
votes
1 answer

Convert a Pandas dataframe with a date column to a Vaex dataframe

I am trying to do the following load some data with string columns measurement_df = pd.read_csv('data/tag_measurements/all_measurements.csv') measurement_df.head(3) measurement_df >> prints . timestamp tag_1 tag_2 tag_3…
amirdel
  • 53
  • 1
  • 5
3
votes
1 answer

How change the point style in a vaex interactive Jupyter bqplot plot_widget to make individual points larger and visible?

I am evaluating vaex for an interactive outlier selection use case described at: Large plot: ~20 million samples, gigabytes of data Basically, I have some individual points which are outliers, and I want to see them on a graph to manually select…
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
3
votes
2 answers

How to do interactive 2D scatter plot zoom / point selection in Vaex?

I saw that it is possible to do it during the demo: https://youtu.be/2Tt0i823-ec?t=769 There, the presenter has a huge dataset, and can quickly zoom in by selecting a rectangle with the mouse. I also saw the "Interactive Widgets" section of the…
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
3
votes
2 answers

Groupby and combine a dataframe using Vaex

I have a large .csv file with roughly 150M rows. I can still fit the entire data set into memory and use Pandas to groupby and combine. Example... aggregated_df = df.groupby(["business_partner", "contract_account"]).sum() In the above example the…
davidrpugh
  • 4,363
  • 5
  • 32
  • 46
2
votes
1 answer

Efficiently convert numpy matrix to Vaex DataFrame

I'm trying to turn my wide (100K+ columns) 2D numpy data into a Vaex Dataframe. I'm reading through the documentation, and I see two relevant functions: from_items from_arrays but both give me an entire column x, where each row is a numpy array.…
Dave Liu
  • 906
  • 1
  • 11
  • 31
2
votes
0 answers

Opening arrow files using vaex slower and using more memory than expected

I have multiple .arrow files, each about 1GB (total filesize is larger than my RAM). I tried to open all of them using vaex.open_many() to read them into a single dataframe, and saw that the memory usage was increasing by GBs, and it was taking…
Rayne
  • 14,247
  • 16
  • 42
  • 59
2
votes
1 answer

Error using vaex: blake3.__new__() got an unexpected keyword argument 'multithreading'

When I use vaex as follows: for i, df in enumerate(vaex.from_csv('cars.csv', convert=True,chunk_size=100_000)): print(df.info()) I get an error: blake3.__new__() got an unexpected keyword argument 'multithreading' What am I doing…
Stock
  • 21
  • 1
2
votes
0 answers

Is it possible to read_sql query using Vaex?

Pandas have read_sql to read a query from database directly in database. query = "select top 100 * from TABLE" df=pd.read_sql(query, redshift_conn) Can I do the same thing using Vaex? Vaex is not having to_sql so, I was converting vaex dataframe…
2
votes
0 answers

ValueError: Merging datasets with unequal row counts vaex join python

I am Trying to get the common data from two csv having different number of rows using vaex .When doing inner join I am getting below error .Ideally inner join wouldn't require to check for same number of rows count of dataframes PS…
alok sharma
  • 35
  • 1
  • 7
2
votes
0 answers

VAEX groupby datetime column and another column

I have a timeseries of data and I'd like to use VAEX to manipulate it. I need to groupby an integer "species" column and then also bin by minute. I have tried air_df.groupby(by = [vaex.BinnerTime(air_df["DT"],resolution =…
2
votes
1 answer

Does vaex data frame doesn't support data generation

i have a dataset with…
The_Third_Eye
  • 303
  • 3
  • 15
1
2
3
12 13