1

I have the following sql statement to query the databricks pipeline event logs and it works.
I tried to rewrite it into a python code, but I failed.

Could somebody provide me any advice? Many thanks!!

SELECT timestamp, details:user_action:action, details:user_action:user_name
FROM event_log_raw 
WHERE event_type = 'user_action'

Please Note the details column here is string type, not struct nor array

The following solutions I tried didn't work
df is a spark dataFrame generated from the table event_log_raw

df.filter(df.event_type == 'user_action').select("timestamp", "details:user_action:action", "details:user_action:user_name") 
df.filter(df.event_type == 'user_action').select("timestamp", "details.user_action.action", "details.user_action.user_name") 
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Lilybone
  • 13
  • 3

1 Answers1

1

Instead of using select you need to use selectExpr because the string like details:user_action:action is a JSON path expression for extracting data from the JSON string (doc):

df.filter(df.event_type == 'user_action')
  .selectExpr("timestamp", "details:user_action:action", "details:user_action:user_name")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132