1

I have this dataframe

dataframe

I would like to duplicate all the rows that (day_of_year == 140) and these duplicate rows replace the day_of_year column with 148.

That is, duplicate the rows and at the same time replace the day_of_year column and give the value 148

I am using vaex

Can someone help me?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459

1 Answers1

1

Lets start with some fake data:

>>> import vaex
>>> import numpy as np
>>> x = [0, 1, 2, 140, 140, 140, 4, 4]
>>> df = vaex.from_arrays(x=x)
>>> df['y'] = df.x**2
>>> df
  #    x      y
  0    0      0
  1    1      1
  2    2      4
  3  140  19600
  4  140  19600
  5  140  19600
  6    4     16
  7    4     16

Now we generate the filtered dataframe only containing the rows with x==140, and replace them with a different value. Note that we do not assign, but use where, since the data is considered immutable in Vaex.

>>> df_replace = df[df.x==140]
>>> df_replace['x'] = (df_replace.x==140).where(148, -1)
>>> df_replace
  #    x      y
  0  148  19600
  1  148  19600
  2  148  19600

Note that the virtual column y is still using the previous x values, it does not change.

Now we only need to concatenate them:

>>> df_new = df.concat(df_replace)
>>> df_new
#    x    y
0    0    0
1    1    1
2    2    4
3    140  19600
4    140  19600
...  ...  ...
6    4    16
7    4    16
8    148  19600
9    148  19600
10   148  19600
Maarten Breddels
  • 1,344
  • 10
  • 12