I have a requirement to develop an application in python. The python application will interact with any database and execute sql statements against it. It can also interact with Databricks instance too and query the tables in databricks.
The requirement is that the python application should be platform independent. So the application is developed in such a way that if it runs on databricks, only then it will trigger the spark specific code with in the application. If it is run on a standalone node, it skips. The python programs interacts with Azure blob storages for accessing some files/folders. The python application is deployed on Standalone Node/Databricks as a Wheel.
The issue here is with custom logging. I have implemented custom logging in the python application. There are two scenarios here based on where the application is being run.
- Standalone Node
- Databricks Cluster.
If the code is run on Standalone Node, then the custom log is initially getting logged into local OS folder and after the application completes successfully/fails, it is moved to azure blob storage. But for some reason if it fails to move the log file to azure storage, it is still available in the local file system of Standalone Node.
If the same approach is followed on Databricks, if the application fails to upload the log file to blob storage, we cannot recover it as the databricks OS storage is volatile. I tried to write the log to dbfs. But It doesn't allow to append.
Is there a way to get the application logs from databricks? Is there a possibility that the databricks can record my job execution and store the logs? As I mentioned, the python application is deployed as wheel and it contains very limited spark code.