I have json files of the following structure:
{"names":[{"name":"John","lastName":"Doe"},
{"name":"John","lastName":"Marcus"},
{"name":"David","lastName":"Luis"}
]}
I want to read several such json files and distinct them based on the "name" column inside names. I tried
df.dropDuplicates(Array("names.name"))
but it didn't do the magic.