With the Databricks Lakehouse platform, it is possible to create 'tables' or to be more specific, delta tables using a statement such as the following,
DROP TABLE IF EXISTS People10M;
CREATE TABLE People10M
USING parquet
OPTIONS (
path "/mnt/training/dataframes/people-10m.parquet",
header "true"
);
What I would like to know is, what exactly happens behind the scenes when you create one of these tables? What exactly is a table in this context? Because the data is actually contained in files in data lake (data storage location) that delta lake is running on top of.. right? Are tables some kind of abstraction that allows us to access the data stored in these files using something like SQL?
What does the USING parquet
portion of this statement do? Are parquet tables different to CSV tables in some way? Or does this just depend on the format of the source data?
Any links to material that explains this idea would be appreciated? I want to understand this in depth from a technical point of view.