Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
0
votes
1 answer

Vaex convert csv to feather instead of hdf5

Does vaex provide a way to convert .csv files to .feather format? I have looked through documentation and examples and it appears to only allow conversion to .hdf5 format. I see that the dataframe has a .to_arrow() function but that looks like it…
afriedman111
  • 1,925
  • 4
  • 25
  • 42
0
votes
0 answers

Optimize reading from MS SQL server and writing to CSV file

I am trying to optimize read/write task for a client using a Python Script. They have to have the data in a csv file even though i suggested parquet files. My code reads from a sql database into a dataframe and from a dataframe i write it to a csv…
db0
  • 21
  • 2
0
votes
0 answers

ERROR reading vaex data. 'data' is not defined in vaex

I have been looking everywhere for an answer and can't figure it out. it used to work and now it doesn't. I have updated Vaex.... Here's the code: import vaex as vx data=vx.open("CyWcsv.csv") ## I have validated the file has been…
JMR
  • 1
  • 1
0
votes
0 answers

Python Vaex join dataframe where values of two columns do NOT match

Hi I am wondering if there is a Vaex equivalent to the below pandas python join syntax. Essentially I am trying to join a dataframe onto itself where values in column 1 match values in column 1 and values in column 2 do not match values in column…
Kristina
  • 11
  • 2
0
votes
0 answers

pandas/vaex: applying nunique on float columns

I am using nuniuque method in both pandas and vaex to count and/or compare float values in aggregated groups. In pandas though, when filtering out rows I use np.isclose method to be able to set relative tolerance and avoid errors in comparison but I…
euh
  • 319
  • 2
  • 11
0
votes
0 answers

vaex error during head of a large data frame

I am trying to use vaex as alternative to pandas to merge extremely big data frames( 100k rows + 176m rows) on a string column. The .join seems to work without any error and I can even check .shape of the result data frame but when I try to .head…
euh
  • 319
  • 2
  • 11
0
votes
0 answers

apply function to column out-of-memory in Python Polars

I have a large GIS dataset (167x25e6) that was generated from GeoJSON, via .csv to now parquet. This is my first time that I really have to deal with out-of-memory dataframes and I am still trying to find out if Polars is the right option for my…
0
votes
0 answers

vaex and ipynb problems

I am new to vaex. Just started using it to speed up some groupby + agg.nunique operations on ~40 million rows Data Frame in jupyter notebook. It works much faster than pandas, I am really excited to use it more often but sometimes I experience weird…
euh
  • 319
  • 2
  • 11
0
votes
1 answer

Splitting list of strings in a column of vaex dataframe

There is a vaex dataframe with a column such as: df['col'] ['aa', ' NO'] ['aa', ' NO'] ['aa', ' NO'] ['aa', ' NO'] ['aa', ' NO'] I want to convert this one column to two columns as follow: df['col1', 'col2'] ['aa'], [' NO'] ['aa'], ['…
HMadadi
  • 391
  • 5
  • 22
0
votes
0 answers

how to split 'number' to separate columns in vaex DataFrame

In pandas we can split 'number' column to multiple columns, such as this example. Is it possible to do in vaex dataframe? In below simple example col_1 is empty! df = vaex.from_arrays(x = [10000, 100001, 100002, 100003, 100004, 100005, 100006,…
HMadadi
  • 391
  • 5
  • 22
0
votes
1 answer

Lambda function in concatenated dataframes in Vaex

I have multiple tar files that in each there are multiple csv files. I want to open all csv files as a vaex dataframe and then make a new column with lambda function but I got bellow error. How can I do it? def get_years_files(num_years): files…
HMadadi
  • 391
  • 5
  • 22
0
votes
2 answers

Correctly format timestamp in Vaex (Remove colon from UTC offset)

I have a dataframe in vaex that I'm having trouble with the timestamp format. I can't seem to correct format the timestamp column. After researching the problem, I have come to the conclusion that I need to remove the colon in the UTC offset…
Rebecca James
  • 383
  • 2
  • 12
0
votes
0 answers

How to split a file after read by VAEX in python

I want to split my .txt file by starting 4 digits and create column in vaex. I do this by pandas and want to do the same as below by vaex. df= pd.read_csv(r"D:\Personal Projects\Testing_file.txt") df['First'] = df['address'].str[:4] df['Second'] =…
Manish
  • 1
0
votes
0 answers

Vaex .apply() method on Vaex data frame is giving incorrect output

I am trying to perform a .apply() method on a vaex data frame but it gives some error. Given below is the code executed - and the error message is as follows - This dataset has 10+ Million rows. I am trying to create one-hot encoded features…
0
votes
1 answer

Vaex: getValues / rows from a filtered dataframe

I am trying to filter a dataframe and get the rows that meet the filter criteria. I have been able to get a the first object from the dataframe by using the select function first and then calling the evaluate function to get the value: …
afriedman111
  • 1,925
  • 4
  • 25
  • 42