I have a few functions that are using pandas.DataFrame.update
method, and I'm trying to move into using Dask
instead for the datasets, but the Dask Pandas API doesn't have the update
method implemented. Is there an alternative way to get the same result in Dask
?
Here are the methods I have using update
:
- Forward fills data with last known value
df.update(df.filter(like='/').mask(lambda x: x == 0).ffill(1))
input
id .. .. ..(some cols) 1/1/20 1/2/20 1/3/20 1/4/20 1/5/20 1/6/20 ....
1 10 20 0 40 0 50
2 10 30 30 0 0 50
.
.
output
id .. .. ..(some cols) 1/1/20 1/2/20 1/3/20 1/4/20 1/5/20 1/6/20 ....
1 10 20 20 40 40 50
2 10 30 30 30 30 50
.
.
- Replaces values in a dataframe with values from another dataframe based on an id/index column
def replace_names(df1, df2, idxCol = 'id', srcCol = 'name', dstCol = 'name'):
df1 = df1.set_index(idxCol)
df1[dstCol].update(df2.set_index(idxCol)[srcCol])
return df1.reset_index()
df_new = replace_names(df1, df2)
input
df1
id name ...
123 city a
456 city b
789 city c
789 city c
456 city b
123 city a
.
.
.
df2
id name ...
123 City A
456 City B
789 City C
.
.
.
output
id name ...
123 City A
456 City B
789 City C
789 City C
456 City B
123 City A
.
.
.