i have this Schema of dataframe df :
root
|-- id: long (nullable = true)
|-- a: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- _VALUE: string (nullable = true)
| | |-- _href: string (nullable = true)
| | |-- type: string (nullable = true)
How can I modify the dataframe such that column a contains only _href
values and not _value
type?
Is it possible?
I've tried something like this , but it's wrong :
df=df.withColumn('a', 'a._href')
For example this is my data :
+---+---------------------------------------------------------------------+
|id| a |
+---+---------------------------------------------------------------------+
| 17|[[Gwendolyn Tucke,http://facebook.com],[i have , http://youtube.com]]|
| 23|[[letter, http://google.com],[hihow are you , http://google.co.il]] |
+---+---------------------------------------------------------------------+
but when i want to look like this:
+---+---------------------------------------------+
|id| a |
+---+---------------------------------------------+
| 17|[[http://facebook.com],[ http://youtube.com]]|
| 23|[[http://google.com],[http://google.co.il]] |
+---+---------------------------------------------+
ps: I don't want to use pandas at all.