Does vaex data frame doesn't support data generation

Question

i have a dataset with schema,

BIKE_ID	REGN_NUMBER	ENGINE_NUMBER	CHASSIS_NUMBER	BUYED_YEAR
1	XN67TY567	34567ABGN65	145089	2011
2	XN67TM567	34567ABGT65	145085	2011
3	XN67TM569	34567VBGT65	1450867	2013
.	.	.	.	.
.	.	.	.	.
2870763	XN56RTMN	34786VHGT65	14501236	2016

Now i would like to generate the data from 28,70,764 to some 3,28,70,764 i.e generating around 30 Million rows so as in pandas we can use the below method.

val = 2870764
df3['POLICY_ID'] = range(val ,val+30000000)

but as it is huge data pandas can't generate, so is there any approach to solve this problem by doing it in Vaex.

But Vaex throws me an error ValueError: range(2870764, 5870764) is not of string or Expression type, but <class 'range'>

So, could anyone suggest me whether can we do in this way in Vaex.

score 2 · Accepted Answer · answered Mar 12 '22 at 23:21

Yes, vaex has a function called vrange that does exactly what you're looking for, with no memory usage.

Example:

import vaex

df = vaex.example()
df

Here is a dataframe with 330,000 rows (using the example dataset at the time of writing). We can generate a new column, POLICY_ID using vaex.vrange

df["POLICY_ID"] = vaex.vrange(0, len(df))

vrange docs: https://vaex.io/docs/api.html#vaex.vrange

Does vaex data frame doesn't support data generation

1 Answers1