-2

We are using Talend open studio for Data Transfer from Cassandra to SQL. While reading data using Talend job, sometimes we face data loss. And we are unable to find any Error for the same. Even Cassandra System/ Debug Logs are showing very limited information. Is there any setting that we can configure on Cassandra or in Talend Open studio by which this Data loss can be avoided?

Note: We are dealing 5M records/Hour and we are missing approximately 1% of data loss. This is not a consistent issue but intermittent one.

BjMangat
  • 21
  • 2
  • Seems to me that there could be many points (network, OS, Java, RAM/CPU) which could lead to this problem. It would be helpful to find out if Talend is actually the problem or if the data loss is happening beforehand. I cannot see that you can rule that out. You should put some thoughts into finding out where the data is leaking. – tobi6 Nov 01 '17 at 13:03
  • Interesting question, but very broad, and not a specific programming issue. Any answer can at best be a guess given the limited information. – Andrew Nov 01 '17 at 13:04
  • I have checked CPU, RAM, OS everything was fine, but still some data was missing during transfer. – BjMangat Nov 02 '17 at 10:36

1 Answers1

0

In this sort of situation I have written some java routines within talend that post out to elasticsearch. Depending on the talend version you have, this comes with talend. And makes log based analysis very easy on large datasets using the Elastic and Kibana to do this. But the key is to log success and failures out using tjavarow using java routines which makes it far easier.

Ptyler649
  • 69
  • 1