how do solve this query in hive and spark?

Question

Write a hivesql and display like below ouput

id     name            dob
-------------------------
1  anjan   10-16-1989

output:

id     name            dob
-------------------------
1       a              10-16-1989
1       n              10-16-1989
1       j              10-16-1989
1       a              10-16-1989
1       n              10-16-1989

and above scenario solve in spark and display same as above output

score 0 · Answer 1 · answered Jan 09 '18 at 09:43

Assuming you have a dataframe (name it data) that comes from Hive like this:

+---+-----+----------+
| id| name|       dob|
+---+-----+----------+
|  1|anjan|10-16-1989|
+---+-----+----------+

you can define a user defined function in spark that transform a string into an array :

val toArray = udf((name: String) => name.toArray.map(_.toString))

Having that we can apply it on the name column:

val df = data.withColumn("name", toArray(res0("name")))

+---+---------------+----------+
| id|           name|       dob|
+---+---------------+----------+
|  1|[a, n, j, a, n]|10-16-1989|
+---+---------------+----------+

We can use now the explode function on the name column

df.withColumn("name", explode(df("name")))

+---+----+----------+
| id|name|       dob|
+---+----+----------+
|  1|   a|10-16-1989|
|  1|   n|10-16-1989|
|  1|   j|10-16-1989|
|  1|   a|10-16-1989|
|  1|   n|10-16-1989|
+---+----+----------+

how do solve this query in hive and spark?

1 Answers1