2

i have a dataset with schema,

BIKE_ID REGN_NUMBER ENGINE_NUMBER CHASSIS_NUMBER BUYED_YEAR
1 XN67TY567 34567ABGN65 145089 2011
2 XN67TM567 34567ABGT65 145085 2011
3 XN67TM569 34567VBGT65 1450867 2013
. . . . .
. . . . .
2870763 XN56RTMN 34786VHGT65 14501236 2016

Now i would like to generate the data from 28,70,764 to some 3,28,70,764 i.e generating around 30 Million rows so as in pandas we can use the below method.

val = 2870764
df3['POLICY_ID'] = range(val ,val+30000000) 

but as it is huge data pandas can't generate, so is there any approach to solve this problem by doing it in Vaex.

But Vaex throws me an error ValueError: range(2870764, 5870764) is not of string or Expression type, but <class 'range'>

So, could anyone suggest me whether can we do in this way in Vaex.

SeaBean
  • 22,547
  • 3
  • 13
  • 25
The_Third_Eye
  • 303
  • 3
  • 15

1 Answers1

2

Yes, vaex has a function called vrange that does exactly what you're looking for, with no memory usage.

Example:

import vaex

df = vaex.example()
df

Here is a dataframe with 330,000 rows (using the example dataset at the time of writing). We can generate a new column, POLICY_ID using vaex.vrange

df["POLICY_ID"] = vaex.vrange(0, len(df))

vrange docs: https://vaex.io/docs/api.html#vaex.vrange