Databricks has open sourced deltalake project in Apr'2018 (Open source deltalake project still to get some functionalities like data skipping etc) Details: Deltalake, Docs, Github Repo
Delta is not file format - it is storage layer on top of parquet & metadata (in json format) files.
It doesn't delete files automatically. Vacuum operation should be performed to delete older & not referenced (not active) file.
So without running 'vacuum' operation, you can time travel infinitely as all data would be available. On other hand, if you perform 'vacuum' with 30 days retention, you can access last 30 days data.
Yes, it solves querying across dataset versions. Each version can be identified by timestamp. Sample queries to access specific version data:
Scala:
val df = spark.read
.format("delta")
.option("timestampAsOf", "2020-10-01")
.load("/path/to/my/table")
Python:
df = spark.read \
.format("delta") \
.option("timestampAsOf", "2020-10-01") \
.load("/path/to/my/table")
SQL:
SELECT count(*) FROM my_table TIMESTAMP AS OF "2010-10-01"
SELECT count(*) FROM my_table TIMESTAMP AS OF date_sub(current_date(), 1)
SELECT count(*) FROM my_table TIMESTAMP AS OF "2010-10-01 01:30:00.000"
(Note: I am using open sourced deltalake in production for multiple use cases)