Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
0
votes
0 answers

Dump SQL table to FILE and applying a custom function?

I have a situation where writing a PL/pgSQL function solution is to slow and cumbersome to write and probably impossible cause I need many python modules. That's why I want to opt for VAEX or DASK. The plan: dump the SQL table to a file, then apply…
sten
  • 7,028
  • 9
  • 41
  • 63
0
votes
1 answer

ValueError: operand '!=' not supported for string comparison

I want to compare value with string i did df = df[df.s1 != 'NON eq'] I was gotting this error ValueError: operand '!=' not supported for string comparison
biwia
  • 407
  • 1
  • 4
  • 10
0
votes
1 answer

vaex extract one column of str.split()

I want nearly the same as answered here for pandas - but want to run it in vaex. As vaex does lazy copy, for me it would be okay, to save (my two) columns of str.split into the vaex-df. But there is nothing like expand=True.
Bastian Ebeling
  • 1,138
  • 11
  • 38
0
votes
0 answers

Clustering millions of large binary vectors?

I want to generate millions of large binary vectors (10_000 ... 100_000 bits). Then I want to cluster them by OVERLAP (AND) . After that I want to reorder the vectors according to the clustering and save it for later. Scipy have a clustering method…
sten
  • 7,028
  • 9
  • 41
  • 63
0
votes
1 answer

In Python Vaex library how can I replace values of columns with allowed custom values of that columns

I have a dictionary with key-value pair columns name and value as a list of allowed values in that columns How to replace values that are not occurring in the dictionary list with '0' FinalCat_ is the column names list CombinedCat is Vaex…
0
votes
1 answer

vaex: How to limit number of cores/threads/processes?

How can one limit the number of cores/threads/processes that are being used by vaex? Some operations have a boolean parallel switch, but I don't see a way to have more fine-grained control (which is important on larger shared servers). Code snippet…
kuropan
  • 774
  • 7
  • 18
0
votes
1 answer

Can't open HDF5 file bigger than memory... ValueError

I have many .csv of NYC taxi from nyc.gov, one .csv = year-month. There I grab cca 15 of csvs and make HDF5s from them: import h5py import pandas as pd import os import glob import numpy as np import vaex from tqdm import tqdm_notebook as…
314mip
  • 383
  • 1
  • 4
  • 13
0
votes
0 answers

Displaying full integers instead of scientific notiation when printing out Vaex HDF5 data

my code: myfile = vaex.open('myfile.hdf5') myfile['customer_id'] output: Length: 4,259,376 dtype: int64 (column) 0 9.4618e+08 1 9.43324e+08 2 9.43325e+08 3 9.43333e+08 4 9.43333e+08 ... How can I change the output format…
SophieLD
  • 29
  • 6
0
votes
1 answer

Can featuretools be used on a vaex dataframe?

I'm trying to play with automated feature engineering - I've got it to work on raw dataframes but I'm not sure to do it on out of memory dataframes such as vaex. My purpose is to find a way to use automated feature engineering when data frame…
Lostsoul
  • 25,013
  • 48
  • 144
  • 239
0
votes
2 answers

convert csv to hdf5 by using vaex.from_csv Error: 'DataFrameArrays' object has no attribute 'dtype'

I have a csv file with more than 13 million rows, I want to convert to hdf5: I can run code: df_chunk = vx.from_csv(r'df.csv', nrows=20_000_000) but if I run following code: df_chunk.export(r'df.hdf5') I got error: AttributeError:…
SophieLD
  • 29
  • 6
0
votes
1 answer

Vaex Dataframe and Expression: Filter every nth row (Python)

I have some pretty big hdf Files (10e9 rows, about 100Gb) containing [X,Y,Z,Sensor_0,...,Sensor_n] values. For processing i am using vaex, which gives me nice and fast results. However, i am struggling with the following issue: I havent found a way…
AM_Guy
  • 63
  • 10
0
votes
1 answer

Can we load .txt files to vaex?

I have folder of .txt files which is of the size of 52.6 GB. The .txt files are located in various subfolders. Each subfolder has unique labels "F","G", etc. Each subfolder has got many .txt files. I need to combine all the .txt files of each unique…
shadow kh
  • 101
  • 2
0
votes
2 answers

Extract and combine data from 3 large tsv/csv files

I have 3 big tsv files with the following structure : file1 : id,f1,f2,name,f3 file2 : id,f4,blah1,f5 file3 : id,f5,f6,blah2 I want to create a third file that is extract from the others: result: id,name,blah1,blah2 Currently i cant because…
sten
  • 7,028
  • 9
  • 41
  • 63
0
votes
1 answer

Vaex unable to open hdf5 created by pandas

I am getting this error: OSError: Could not open file: test/pd.hdf5, did you install vaex-hdf5? Is the format supported? Yes I have installed vaex-hdf5 Here is a screenshot of the hdf5 I am attempting to open in vaex, opened in pandas: Any help is…
Hairy
  • 393
  • 1
  • 10
0
votes
1 answer

ModuleNotFoundError: No module named 'vaex.remote'

I was trying to install the vaex application from Anaconda Navigator, but it fails to launch with an error: ModuleNotFoundError: No module named 'vaex.remote'. Everything is installed, and I even reinstalled everything, with no better results: ~$…
mrgou
  • 1,576
  • 2
  • 21
  • 45