Questions tagged [vaex]

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas)

Vaex is a python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular datasets. It can calculate statistics such as mean, sum, count, standard deviation etc, on an N-dimensional grid for more than a billion (10^9) objects/rows per second. Visualization is done using histograms, density plots and 3d volume rendering, allowing interactive exploration of big data. Vaex uses memory mapping, zero memory copy policy and lazy computations for best performance (no memory wasted).

181 questions
0
votes
1 answer

Xgboost with vaex

I would like to clarify: does vaex.ml.sklearn allows to perform out-of-core ML? I try to use examples from documentation and see that if I use dataset from hdf5 file (evaluated dataset consumes ~3 Gb of RAM) in xgboosting procees RAM usage is around…
0
votes
1 answer

Vaex: replace single character in column names

I have a dataset with dot delimiter in column name, i.e. name_1.0. I understand that vaex changes these columns as name_1_0. I would like to use .drop() for my data frame. However, as I feel, it is not possible with column names that contain dot…
0
votes
1 answer

copy dataframe lines and replace on the same dataframe

I have a dataframe in which I have 2 records that are with few values, I wanted to replace those records with others with more values, make a copy. Does anyone know how to do this on pandas or vaex? image wanted to replace the values ​​148 for…
0
votes
1 answer

How can I use CUDA with vaex (a Python library)

my code as follow: df['O_ID'] = (df.apply(get_match_id, arguments=[df['pickup_longitude'], df['pickup_latitude']])).jit_cuda() When first I used this function——jit_cuda(),there was an error "No Module named cupy" But, when I have installed the…
Bardbo
  • 1
0
votes
1 answer

Trying to convert a csv to HDF5 and read it using vaex

Used this piece of code to convert the csv into HDF5 with a given chunk size dv = vaex.from_csv('Wager-Win_April-Jul.csv', convert=True, chunk_size=5_000_000) But getting this error while executing the…
0
votes
1 answer

Preserving datetime type when converting from CSV to HDF5 with vaex

I have a csv file with a time column storing timestamps. After converting this file to hdf5 format using the vaex.from_csv() method, the values from the time column are strings. For example: df = vaex.open("data.csv.hdf5") time =…
0
votes
1 answer

vaex groupby gives TypeError: unhashable type: 'Expression' when reading data from multiple hdf5 files

In Python, I open a data frame from multiple hdf5 files with vaex (vdf = vaex.open('test_*.hdf5')). Everything seems to work nicely, e.g. combining two columns to make a new one (vdf['newcol'] = vdf.x+vdf.y). But I cannot get vaex's groupby to work:…
Edgar
  • 412
  • 2
  • 6
  • 15
0
votes
1 answer

404 when accessing the example() data of vaex

When accessing the vaex.example() as documented on the vaex sphinx docs home page https://vaex.readthedocs.io/en/latest/ It serves up a 404: import vaex; df = vaex.example(); dfn = df[0:5][['x','y']]; dfn.describe() Downloading…
WestCoastProjects
  • 58,982
  • 91
  • 316
  • 560
0
votes
1 answer

Can't install vaex on Python 3.7.5 and Ubuntu 18.04 because of LLVM?

Trying to install vaex package using sudo pip3 install vaex But getting the following error: got version from file /tmp/pip-build-4ejf0kw2/llvmlite/llvmlite/_version.py {'version': '0.34.0', 'full': 'c5889c9e98c6b19d5d85ebdd982d64a03931f8e2'} …
SteveS
  • 3,789
  • 5
  • 30
  • 64
0
votes
1 answer

Vaex Displaying Data

I have a 10.11 GB CSV File and I have converted to hdf5 using dask. It is a mixture of str, int and float values. When I try to read it with vaex I just get numbers as given in the screenshot. Can someone please help me out? Screenshot:
0
votes
1 answer

Why does vaex change column names that contain a period?

When using vaex I came across an unexpected error NameError: name 'column_2_0' is not defined. After some investigation I found that in my data source (HDF5 file) the column name causing problems is actually called column_2.0 and that vaex renames…
Joe
  • 418
  • 4
  • 12
0
votes
1 answer

Can't read data using read_csv due to encoding errors

So, I am facing a huge issue. I am trying to read a csv file which has '|' as delimiters. If I use utf-8 or utf-sig-8 as encoders then I get the following error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 0: invalid start…
Prakhar Rathi
  • 905
  • 1
  • 11
  • 25
0
votes
1 answer

vaex Object dtype dtype('O') has no native HDF5 equivalent

I use vaex.from_csv() to convert csv to hdf5 . import vaex vaex.from_csv("/Users/xxxx/development/vaex/dataAN/testdata1.csv", convert=True) Get IPython/core/interactiveshell.py:3331: DtypeWarning: Columns (53,55) have mixed types.Specify dtype…
heyuqi
  • 3
  • 4
0
votes
1 answer

How to scale data to make area under the graph equal to 1

I made a function which can plot statistics for large arrays (10**8) less than 2 seconds. How can I scale Y-axis to make area under the graph equal to 1? def dis(inp): import numpy as np import vaex import matplotlib.pyplot as plt if…
dereks
  • 544
  • 1
  • 8
  • 25
0
votes
2 answers

how can a specific cell be accessed in a vaex data frame?

vaex is a library similar to pandas, that provides a dataframe class I'm looking for a way to access a specific cell by row and column for example: import vaex df = vaex.from_dict({'a': [1,2,3], 'b': [4,5,6]}) df.a[0] # this works in pandas but not…
Ophir Yoktan
  • 8,149
  • 7
  • 58
  • 106
1 2 3
12
13