If my raw data is in CSV format and I would like to store it in the Bronze layer as Delta tables then I would end up with four layers like Raw+Bronze+Silver+Gold. Which approach should I consider?
Asked
Active
Viewed 150 times
0
-
Please refer this link :https://www.databricks.com/glossary/medallion-architecture – Sharma Jan 11 '23 at 11:12
1 Answers
1
A bit of an open question, however with respect to retaining the "raw" data in CSV I would normally recommend this as storage of these data is usually cheap relative to the utility of being able to re-process if there are problems or for purpose of data audit/traceability.
I would normally take the approach of compressing the raw files after processing and perhaps tar-balling the files. In addition moving these files to colder/cheaper storage.

Chris
- 474
- 3
- 7