0

I have a requirement to develop an application in python. The python application will interact with any database and execute sql statements against it. It can also interact with Databricks instance too and query the tables in databricks.

The requirement is that the python application should be platform independent. So the application is developed in such a way that if it runs on databricks, only then it will trigger the spark specific code with in the application. If it is run on a standalone node, it skips. The python programs interacts with Azure blob storages for accessing some files/folders. The python application is deployed on Standalone Node/Databricks as a Wheel.

The issue here is with custom logging. I have implemented custom logging in the python application. There are two scenarios here based on where the application is being run.

  1. Standalone Node
  2. Databricks Cluster.

If the code is run on Standalone Node, then the custom log is initially getting logged into local OS folder and after the application completes successfully/fails, it is moved to azure blob storage. But for some reason if it fails to move the log file to azure storage, it is still available in the local file system of Standalone Node.

If the same approach is followed on Databricks, if the application fails to upload the log file to blob storage, we cannot recover it as the databricks OS storage is volatile. I tried to write the log to dbfs. But It doesn't allow to append.

Is there a way to get the application logs from databricks? Is there a possibility that the databricks can record my job execution and store the logs? As I mentioned, the python application is deployed as wheel and it contains very limited spark code.

shankar
  • 196
  • 14

1 Answers1

0

Is there a way to get the application logs from databricks? Is there a possibility that the databricks can record my job execution and store the logs?

I think you are able to do that now , but once the cluster is shut down ( to minimize cost ) , the logs will be gone . I am thankful to you to share that the logs in DBFS can only be appended , i was not aware about that .

Is your standalone application is open to internet , if yes than may be you can explore the option of writing the logs to Azure event hub . You can write to eventhub from ADb and standalone application and then write that to blob etc for further visualization . This tutorial should get you started . https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-python-get-started-send

HTH

HimanshuSinha
  • 1,650
  • 2
  • 6
  • 10