0

I'm migrating about 2.5 Terabytes of data from an on-prem Microsoft SQL Server to Amazon S3 using DMS. It consists of 6 tables, 2 of which are 1.3TB and 1TB respectively.

The migration of the 1.3TB table took 26Hrs. However, the data that is present in CSV in S3 after the migration is about ~200GB.

Is this significant reduction in file size normal in migrating from SQL Server to S3-CSV or could there be a loss of data?

Shitij Mathur
  • 385
  • 2
  • 10
  • 1
    You talk about moving to an AWS database, but then a CSV; what does thea CSV file have to do with things here? If you're saying you've exported your table to a CSV file, then it is likely to be smaller; your CSV file isn't well indexed, has no constraints, etc etc. Though exporting to a CSV and then into another RDBMS seems less than ideal if I am honest. – Thom A Apr 11 '21 at 20:30
  • 1
    Also, this question doesn't appear to be about programming at all, so isn't on topic for [so]. – Thom A Apr 11 '21 at 20:32
  • @Larnu I'm migrating the data to Amazon's S3 Bucket Storage. AWS DMS writes the data with S3 as the target as .csv files by default. I'm going to build ML models on this data and so I'm not migrating it to an RDBMS on AWS. Reference: https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Target.S3.html – Shitij Mathur Apr 11 '21 at 20:39
  • Only you can verify your data. I would start with a rowcount, either by configuring Athena to read the table or just streaming it onto an EC2 instance and running `wc`. – Parsifal Apr 12 '21 at 12:34

0 Answers0