0

I am loading data from SQL server DB table to MonetDB table using spark job. My source table has 100000 records. I am directly moving these records into monetdb table and no filter and joins at all. But once job completed I am seeing 279997 records in Target monetdb table. It has been observed , records are getting duplicated in the target table.

We have created simple etl to move one table to another table, not sure what went wrong. Please can someone help us on this.

Regards, NarsimhaReddy

1 Answers1

0

This does not ring a bell for me. Most likely would be if there is stale data in your table from an earlier attempt.

Which version of MonetDB are you using? And which library does your ETL tool use to access MonetDB (I don't know anything about Spark).

Maybe you can investigate

  1. whether the table indeed started out empty;
  2. whether de additional rows are duplicates of existing source rows or if they are simply garbage;
  3. maybe you can investigate exactly which statements your ETL tool executes, table sys.queue might be helpful there;
  4. and try to recreate them in a standalone sql script (to be executed through mclient) which reproduces the issue;
  5. whether the ETL correctly detects any errors that might be returned by the statements it executes, or that it may have swallowed a helpful error message;
  6. whether the ETL tool can be made to log somewhere the number of rows it believes to have created in MonetDB.

I'm sorry I cannot give you a more specific answer, this is going to require some digging.