I started playing with streaming on my Community Edition Databricks but after some minutes of producing test events I encountered some problem. I believe it's somehow connected with the fact of some temporary small files produced during streaming process. I would like to find them and remove, but can't find where are they stored. My exception is
com.databricks.api.base.DatabricksServiceException: QUOTA_EXCEEDED: You have exceeded the maximum number of allowed files on Databricks Community Edition. To ensure free access, you are limited to 10000 files and 10 GB of storage in DBFS. Please use dbutils.fs to list and clean up files to restore service. You may have to wait a few minutes after cleaning up the files for the quota to be refreshed. (Files found: 11492);
And I have tried to run some shell script to find out the number of files per each folder but unfortunately I cannot find suspicious, mostly lib
, usr
and other folder containing system or python files are there, cannot find anything that could be produced by my streaming. This script I use
find / -maxdepth 2 -mindepth 1 -type d | while read dir; do
printf "%-25.25s : " "$dir"
find "$dir" -type f | wc -l
done
Where can I find the reason for too many files
problem? Maybe it's not connected to Streaming at all?
To make it clear, I have not uploaded many custom files to /FileStore