Which would be the best way to save timeseries data to an external file system for further processing in another platform?
By best I mean:
- Less possible coding
- As much endurance to heavy load as possible
- Integrated with TB (or using TB tools or workarounds inside TB)
The solution could be near realtime, or batch, as the final destination is for analysis and does not require last minute information.
I thought of a few ways, but I'd like to get some advice or thoughts on which one would be more convenient. And of course, I will appreciate any other ideas I haven't thought of.
For the near realtime mode:
In the rule chain, save the data with a node following the Save Timeseries node.
a. Could this affect the performance significantly? as it will add a save operation to file system for each call to the node.
b. Is there any code example of such a node?
Use Rest API Call node to POST timeseries data to an endpoint and generate the file there at the endpoint.
a. Does this have any chance of being performant?
b. Is there any code example of how to do it?
Use Kafka node to forward the timeseries data to a Kafka server and from there to its final destination.
a. This introduces another tier (Kafka) which requires another kind of expertise and more resources.
b. Has anybody worked with this node successfully? And, would you mind sharing and example?
For the batch mode:
Find out which tables are used by TB inside Cassandra, and code a script to extract information directly from the database.
a. This would require to have a good knowledge of the TB data model inside Cassandra and of Cassandra itself to be able to write the scripts. So it doesn't look like a very natural/integrated way of solving the problem.
Create a rule chain that triggers a query at certain time intervals, to retrieve a time period of TB saved timeseries data, and then uses some of the near realtime options to save the file in one operation.