I have a python data load service. One of the steps in the service is to refresh multiple Oracle materialized views. We have noticed that the service often gets stuck at this step and the issue gets fixed after a restart (pod). I want to configure a command based openshift liveness probe here. The purpose is to detect if the service is stuck at this step for say more than x hours, if yes then the probe fails and pod is restarted. The service doesn't have http access to it.
we use enormous logging in the script that is being run here. Is there a way to poll the openshift deployment log (latest one) and look for certain messages.
example:
#msg1
print("Refreshing materialized views")
.
.
.
#msg2
print("materialized view refreshed")
msg1 marks the start of potentially problematic step. My intent to write a command that polls the log and looks for msg2 (as it marks completion, exit status 0), if it doesn't find msg2 for more than 5 hours say, it must return non zero exit status causing the probe to fail.
How can I implement this? Is this the best way to do it?