I would like to append data on a published dask dataset
from a queue (like redis). Then other python programs would be able to fetch the latest data (e.g. once per second/minute) and do some futher opertions.
- Would that be possible?
- Which append interface should be used? Should I load it into a
pd.DataFrame
first or better use some text importer? - What are the assumed append speeds? Is it possible to append lets say 1k/10k rows in a second?
- Are there other good suggestions to exchange huge and rapidly updating datasets within a dask cluster?
Thanks for any tips and advice.