To recap, your DVC project's default remote found in a local directory (/tmp/dvc-storage
). OK
All your data files are in /tmp/dvc-storage
so that's where you could point your file explorer to, but this type* of DVC remote (local directory) is not meant for direct human handling. They're been renamed and reorganized in the same way as the project cache.
Basically, the directory structure (let's call it space dimension) AND data versions (time dimension) are flattened into a content-addressable data store. This is why you see all those 2 letter directories containing long hex file names (similar to Git Object Storage).
By default nothing is deleted from the cache (or remote storage) during regular dvc
operations. The data store is append-only for the most part. This way you can git checkout
and dvc checkout
(or dvc pull
) the data for a previous project version (past Git repo commit).
You'd have to specifically garbage collect certain data from cache or storage locations using dvc gc
, and even then it's designed to try preserving stuff you might need in the future.
Note that dvc add
does not affect remote storage, it only works with the local cache. You need to dvc push
and dvc pull
to sync the data cache with a DVC remote.
Wrt the Studio UI, I'm not sure where you see that path but its correct (as its hopefully clearer now). You'd get the same from dvc get --show-url
, so maybe reading that reference helps.
* Note that DVC remotes can integrate with cloud versioning on Amazon S3, Azure Blob Storage, and Google Cloud Storage (probably more in the future). This means that if you use those types and enable this feature, you'll see the same directory structure as in your project folder (not the obfuscated cache structure). Cloud-versioned remotes are easier to handle directly (although it may also not be ideal).