First lets see how our delta table transaction log looks like.
DESCRIBE DETAIL <delta_table>;
Output:
+-------+-------------------+----------------+--------------------+--------------------+--------------------+----+-----------------+--------------------+-----------+-----------------+-------------+--------------------+------------+--------------------+
|version| timestamp| userId| userName| operation| operationParameters| job| notebook| clusterId|readVersion| isolationLevel|isBlindAppend| operationMetrics|userMetadata| engineInfo|
+-------+-------------------+----------------+--------------------+--------------------+--------------------+----+-----------------+--------------------+-----------+-----------------+-------------+--------------------+------------+--------------------+
| 0|2023-03-16 01:34:22|3281063801330285|abc@g...|CREATE TABLE AS S...|{isManaged -> tru...|null|{432383257327813}|0316-012541-6spdz4dm| null|WriteSerializable| true|{numFiles -> 5, n...| null|Databricks-Runtim...|
+-------+-------------------+----------------+--------------------+--------------------+--------------------+----+-----------------+--------------------+-----------+-----------------+-------------+--------------------+------------+--------------------+
We can see that each transaction has a version
number and timestamp
associate with it. These are the following ways to query our older data:
- Using Timestamp:
SELECT * FROM <delta_table>
TIMESTAMP AS OF "2023-03-16 01:34:22";
- Using Version number:
SELECT * FROM <delta_table>
VERSION AS OF <version_number>;
(or)
SELECT * FROM <delta_table>@v<version_number>;
In your case, the query should like:
SELECT c_first_name
FROM customer_table VERSION AS OF 0
WHERE c_customer_sk = 1;