1

I would like to read events from eventhub using Databricks, events are in json format but they can have different schema (it's important because i find solutions in which the schema was given to from_json(jsonStr,schema) function, but i cannot use it in my use case). When i use .withColumn('Value', col('value').cast(StringType() in dataframe returns json output with backslashes "{\"time\": 1432826855000,\"host\":...... .

I found a solution How to prevent spark sql with kafka from adding backslash to JSON string in dataframe but in Delta Live Tables framework we create streaming tables by returning a dataframe, so i cant use this solution.

Should i use non pyspark functions in etl process such as How to remove backslash from decoded JSON string? ? Will it be efficient during streaming from eventhub to bronze?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
repcak
  • 113
  • 8

1 Answers1

1

You shouldn't worry about that backslashes - it's just a visual representation of your string when you display data and it has " character embedded into a string. Internally, data will be stored without backslashes, like: {"time": 1432826855000,"host":.......

Alex Ott
  • 80,552
  • 8
  • 87
  • 132