I want to select rows from a dask dataframe based on a list of indices. How can I do that?
Example: Let's say, I have the following dask dataframe.
dict_ = {'A':[1,2,3,4,5,6,7], 'B':[2,3,4,5,6,7,8], 'index':['x1', 'a2', 'x3', 'c4', 'x5', 'y6', 'x7']}
pdf = pd.DataFrame(dict_)
pdf = pdf.set_index('index')
ddf = dask.dataframe.from_pandas(pdf, npartitions = 2)
Furthermore, I have a list of indices, that I am interested in, e.g.
indices_i_want_to_select = ['x1','x3', 'y6']
From this, I would like to generate a dask dataframe containing only the rows specified in indices_i_want_to_select