0

I have set up a Fivetran connector to connect to a PostgreSQL database in an EC2 server and snowflake. The connection seems to work (no error), but the data is not really updated.

On the EC2 server, every day a script will pull down the latest dump of our app production database and restore it on the EC2 server, and then the Fivetran connector is expected to sync the database to snowflake. But the data after the first setup date is not synced with the snowflake. Could FiveTran be used in such a setup? If so, do you know what may be the issue of the sync failing?

simonhb1990
  • 143
  • 3
  • 11
  • So what do the Fivetran logs show is happening? Is it running on schedule? Is it not finding any data to load? – NickW Mar 17 '22 at 23:33
  • @NickW It seems that no data is loaded. I suspected it is because the database is restored every day so the logs are always empty. Here is the related log: `"data" : { "extract_time_s" : 1, "extract_volume_mb" : 0, "process_time_s" : 0, "process_volume_mb" : 0, "load_time_s" : 0, "load_volume_mb" : 0, "total_time_s" : 25 }` – simonhb1990 Mar 18 '22 at 21:40
  • Yes, using an incremental load method based on logs is unlikely to work in your scenario. The XMIN method may work - though it appears not to handle deletions. Why are you using a replica of your DB as your source rather than the actual prod db? – NickW Mar 18 '22 at 21:56
  • @Nick There are some restrictions in the industry I'm working in. We have to de-identify some data before we use it. Do you have good recommendation how we should handle this situation? – simonhb1990 Mar 22 '22 at 15:14
  • Can you not de-identify the data as part of the pipeline from your Prod DB to Snowflake? If not, why not use Fivetran from the Prod DB to the backup; de-identify the data in the backup; move the data from the backup DB to Snowflake using Fivetran – NickW Mar 22 '22 at 16:36
  • Unfortunately, we have to de-identify the data. Isn't this the same as what we are doing right now? Restoring the backup database from the Prod DB and moving the data from backup DB to the snowflake using Fivetran? Maybe I missed something here. – simonhb1990 Mar 22 '22 at 20:38
  • No. I’m suggesting using Fivetran to move data directly from your Prod DB to Snowflake and de-identify it as part if the Fivetran pipeline (bypassing the need for a backup DB). If your de-identification process cannot be implemented in Fivetran, then use Fivetran to keep your backup DB in sync with Prod, rather than restoring it. You would then be able to use Fivetran to move data from the backup DB to Snowflake – NickW Mar 22 '22 at 22:39

1 Answers1

0

Could FiveTran be used in such a setup?

Yes, but it's not ideal.

If so, do you know what may be the issue of the sync failing?

It's hard to answer this question without more context, however: Fivetran uses logging to replicate your DB (WAL in the case of PostgreSQL), so if you restore the DB every single day Fivetran will loose track of the changes and will need to re-sync the whole database.

The point made by NickW is completely valid, why not replicate from the DB? I assume the answer is along the lines of the data you need to modify. You can use column blocking and/or hashing to prevent sensible data from being transfered, or to obfuscate it before it's flushed to Snowflake.

MikeMajara
  • 922
  • 9
  • 23