I am a ML programmer, and I am building a High-Frequency trading model. Once I am in production, I need to capture data directly from the market. I am using an influxdb database, i.e. a time series database (TSDB), but I don't know how to capture data in real-time. I know there exists a design pattern we can probably use which is called Change Data Capture (CDC). Could we work with CDC in InfluxDB-Python? Is it a better solution that InfluxDB works with Debezium or alone?
2 Answers
In InfluxDB 1.x, you could use subscription to capture the change data. More on subscription is here.
In InfluxDB 2.x, you could leverage Kapacitor to steam the data. For more information, see here.
InfluxDB Cloud and InfluxDB OSS 2.6 do not have subscription APIs and do not support Kapacitor stream tasks, but you can continue to use stream tasks by writing data directly to Kapacitor.

- 1,576
- 2
- 19
To capture real-time data feeds, I would resort to websocket clients such as https://pypi.org/project/websocket-client, and/or pusher client (https://pusher.com/) as long as data source provides a suitable data feed. If not, I'd have to build it by myself using their REST API.
Now, having a Python-based CDC component between the data source and InfluxDB as target database is possible too.

- 21
- 3