Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions

votes

1 answer

vaex - create a dataframe from a list of lists

In Vaex's docs, I cannot find a way to create a dataframe from a list of lists. In pandas I would simply do pd.DataFrame([['A',1,3], ['B',2,4]]). How can this be done in Vaex?

asked Dec 16 '20 at 09:58

shamalaia

2,282
3
23
35

votes

0 answers

Converting sparsematrix to hdf5 is taking too much time even in Vaex and memory crashes

I have a dataframe that contains text data and numerical features. I have vectorized text data and I plan to concatenate it with the remaining numerical data for running on Machine Learning algorithms. I have vectorized text data using TIDF as shown…

python sparse-matrix dask hdf5 vaex

asked Oct 22 '20 at 12:55

P H

votes

0 answers

groupby on very large dataset +10GB with python librairies, pandas, vaex and dask

I have more than 10 GB transaction data, i used DASK to read the data, select the columns am intrested in and also groupby the columns i wanted. All this was incredibly fast but computing wasn't working well and debugging was hard. I then decided…

python pandas dask large-data vaex

asked Oct 06 '20 at 09:33

amiraghrs

votes

1 answer

Performance Tips for using Vaex

I am using Vaex and looking for performance tips. My use-case is as follows: I have a large dataframe - let's call it large_df(only a few columns but tens of million rows, and in production, the dataset will be >10x as large). One of the columns…

vaex

asked Jun 09 '20 at 21:53

Josh Reback

votes

1 answer

Plot large data with vaex

I've been struggling to create a plot of a csv with millions of lines. I am trying to use the vaex module but I'm stuck.. import vaex # converts and reads large csv into hdf5 format df = vaex.open("mydir/cov2.csv", …

python hdf5 large-data vaex

asked May 05 '20 at 17:53

Ricardo Guerreiro

votes

2 answers

vaex: shift column by n steps

I'm preparing a big multivariate time series data set for a supervised learning task and I would like to create time shifted versions of my input features so my model also infers from past values. In pandas there's the shift(n) command that lets you…

python vaex

asked Apr 02 '20 at 16:02

sobek

1,386
10
28

votes

0 answers

How to json normalize columns in vaex?

Given a nested json, is there a way to load and flatten it in vaex? This is a way to do it in pandas: import pandas as pd from pandas.io.json import json_normalize df = pd.read_json(input_file) df = pd.concat([df, json_normalize(df['eventData'])],…

python pandas vaex

asked Jan 29 '20 at 18:44

scc

10,342
10
51
65

votes

1 answer

Workflow for modifying an hdf5 file in vaex

As sort of follow on to my previous question [1], is there a way to open a hdf5 dataset in vaex, perform operations and then store the results to the same dataset? I tried the following: import vaex as vx vxframe = vx.open('somedata.hdf5') vxframe…

python vaex

asked Dec 18 '19 at 13:53

sobek

1,386
10
28

votes

1 answer

Columns not showing in Hdf5 file

I have a large data set (1.3 billion data) that i want to visualize with Vaex. Since the data set was very big in csv (around 130gb in 520 separate file), i merged them in a hdf5 file with pandas dataframe.to_hdf function (format:table, appended for…

python pandas hdf5 vaex

asked Dec 15 '19 at 19:10

Olca Orakcı

vote

1 answer

Most efficient way of computing pairwise cosine similarity for large DataFrame

I have a 300.000 row pd.DataFrame comprised of multiple columns, out of which, one is a 50-dimension numpy array of shape (1,50) like so: ID Array1 1 [2.4252 ... 5.6363] 2 [3.1242 ... 9.0091] 3 …

python pandas dask cosine-similarity vaex

asked Jan 06 '23 at 00:42

Johnny

vote

1 answer

very large JSON handling in Python

I have a very large JSON file (~30GB, 65e6 lines) that I would like to process using some dataframe structure. This dataset does of course not fit into my memory and therefore I ultimately want to use some out-of-memory solution like dask or vaex. I…

json pandas dask hdf5 vaex

asked Dec 20 '22 at 04:11

Lionel Peer

vote

1 answer

Multi-columns filter VAEX dataframe, apply expression and save result

I want to use VAEX for lazy work wih my dataframe. After quick start with export big csv and some simple filters and extract() I have initial df for my work with 3 main columns: cid1, cid2, cval1. Each combitations of cid1 and cid2 is a workset with…

python dataframe vaex

asked Dec 16 '22 at 12:29

Jahspear

vote

1 answer

An accurate progress bar for loading files and transforming data using Vaex and Pandas

I am looking for the method to include a progress bar to see the remaining time for loading a file with Vaex (big data files) or transform big data with Panda. I have checked this thread…

python progress-bar vaex

asked Oct 28 '22 at 21:46

José Miguel Rego Terol

vote

1 answer

Columns not recognized when importing HDF5 file

I am trying to import an HDF5 file in python. I do not have details how the file was written. Therefore, I tried vaex and pandas to open it. How can I specify my columns, so that they are recognized? I tried to check the structure of the file…

python pandas hdf5 vaex

asked Oct 12 '22 at 13:40

luki

vote

1 answer

How to write a large .txt file to a csv for Biq Query dump?

I have a dataset that is 86 million rows x 20 columns with a header, and I need to convert it to a csv in order to dump it into big query (adding multiple tags from that). The logical solution is reading the .txt file with pd.read_csv but I don't…

python pandas csv google-bigquery vaex

asked Oct 12 '22 at 00:04

birdman

Prev 1 2

…

12 13 Next