Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
0
votes
1 answer

Vaex - NameError: Column or variable 'example_string' does not exist

I've recently started to use vaex for its great potentialities on large set of data. I'm trying to apply the following function: def get_columns(v: str, table_columns: List, pref: str = '', suff: str = '') -> List: return…
0
votes
1 answer

Vaex datetime error unknown variables or column

I got a vaex.dataframe.DataFrame called df holding a time column called timestamp of type string. I convert the column to datetime as follows import numpy as np from pandas.api.types import is_datetime64_any_dtype as is_datetime if not…
PeeteKeesel
  • 634
  • 1
  • 7
  • 18
0
votes
1 answer

Vaex: str.replace depending on other column

I'm quite new to vaex ;) Problem: I'm importing a huge amount of logfiles into vaex, each as a string and with lowered leters. After that I'm calculating the size of each string into column size For every string I'm calculating and storing the most…
0
votes
1 answer

Is the any way to get "token" using some Google cloud java client?

I'm trying to implement this https://vaex.io/docs/api.html: df = vaex.open('gs://vaex-data/airlines/us_airline_data_1988_2019.hdf5?token=MAGIC_GOOGLE_TOKEN') I have java and { "type": "service_account", "project_id": "project_id", …
Capacytron
  • 3,425
  • 6
  • 47
  • 80
0
votes
0 answers

ERROR:'AttributeError: can't set attribute' while using vaex

This is my code.Pretty straight forward. import vaex vaex_df = vaex.from_pandas(df, copy_index=False) vaex_df.export_hdf5('my_data.hdf5') However i get the error. C:\ProgramData\Anaconda3\lib\site-packages\vaex\dataframe.py…
0
votes
1 answer

Vaex error: AttributeError: 'pyarrow.lib.ChunkedArray' object has no attribute 'dtype'

I am using vaex in python and am having a hard time printing values of a column. If I create a dataframe from local data it works: df = vaex.from_arrays(x=[1, 2, 3], y=[2, 3, 4]) df['inside'] = df.geo.inside_polygon(df2['x'], df2['y'], px,…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
0
votes
0 answers

vaex groupby agg sum of all columns for a larger dataset

I have a dataset consisting of 1800000 rows and 45 columns the operation that I am trying to perform is group by one column, the sum of other columns the 1st step I did is considering data_df as my data frame and all the columns are…
aziz shaw
  • 144
  • 1
  • 12
0
votes
0 answers

How can I read a SAS format datafile in Vaex without converting it to a pandas data frame first?

I was trying to load a 30GB SAS format data file in pandas, but the memory does not allow me to do so. I then find a python library called Vaex, which suppose to analyze big datasets with no memory wasted. However, Vaex can only read data from…
Ze C.
  • 1
0
votes
2 answers

MainThread: Vaex: Error while Opening Azure Data Lake Parquet file

I tried to open a parquet on an Azure data lake gen 2 storage using SAS URL generated (with the datetime limit and token embedded in the url) using vaex by doing: vaex.open(sas_url) and I got the error ERROR:MainThread:vaex:error opening 'the path…
Temiloluwa
  • 23
  • 6
0
votes
0 answers

What is the most pythonic way to relationate 2 pandas dataframe? Based on a key value

So, I work on a place and here I use A LOT of Python (Pandas) and the data keeps getting bigger and bigger, last month I was working with a few hundred thousand rows, weeks after that I was working with a few million rows and now I am working with…
Pedro Bzz
  • 65
  • 2
  • 7
0
votes
1 answer

python pandas - is there any faster way to do explode operation according to the requirement

The code is as following the input dataframe is import pandas as pd import numpy as np df = pd.DataFrame([('bird', 'Falconiformes', 2), ('bird', 'Psittaciformes', 4), ('mammal', 'Carnivora', 8), …
aziz shaw
  • 144
  • 1
  • 12
0
votes
2 answers

concatenate 2 vaex dataframes causing columns issue

I am facing some issues concatenating 2 vaex data frames. When I concat both data frames, the column names are ignored. First I read a CSV file using vaex >>> import vaex as vx >>> df = vx.read_csv("fl_name", header=None) >>> df.column_names …
alcarnielo
  • 55
  • 1
  • 9
0
votes
1 answer

Joining two dataframes using vaex

I am trying to join two data frames that were all imported by vaex. I think this should be simple but I am having challenges with the vaex expressions. Here's what I did: vx_neighbors.join(vx_neighbours_df, on=['Neighbour', 'Year', 'day']) and I…
Kay
  • 2,057
  • 3
  • 20
  • 29
0
votes
1 answer

Vaex: apply changes to selection

Using Vaex, I would like to make a selection of rows, modify the values of some columns on that selection and get the changes applied on the original dataframe. I can do a selection and make changes to that selection, but how can I get them ported…
Humberto
  • 37
  • 5
0
votes
1 answer

register functions with additional arguments?

Is there a way to define a function with additional arguments? My function currently works in the following way: @vaex.register_function() def abc(field) : o = len( set(txt.str.split(' ')) ) return o df.func.field.abc() I want it…
sten
  • 7,028
  • 9
  • 41
  • 63