-1

I have a Data frame in below format.

| Occupation | wa_rating | Genre |
| engineer | 935 | Musical |

Now I want to divide Rating column of this Dataframe by totalRatings.

but when I am doing

resultDF = joinedDF.select(col("wa_rating")/totalRating)

It is giving me below error.

unsupported literal type class java.util.Arraylist
Ayush Goyal
  • 415
  • 4
  • 23

1 Answers1

2

Likely your totalRating variable is a list. For example [100]. And you can't divide a number by a list. This throws your error:

resultDF = joinedDF.select(col("wa_rating")/[100])

but this does not

resultDF = joinedDF.select(col("wa_rating")/100)

Check that totalRating is an actual number (a float or integer). If it's a list containing a number, simply extract the number from it.

EDIT:

From your comments, we now know that your totalRating is a list. You can transform it to a number with:

totalRating = joinedDF3.groupBy().sum("Rating").collect()[0][0]
VinceP
  • 2,058
  • 2
  • 19
  • 29
  • I have created totalRating as follow totalRating = joinedDF3.groupBy().sum("Rating").collect(). Is there a way to create a list instead of this number. – Ayush Goyal Aug 08 '19 at 12:30
  • Ayush, If you want your command to work, `totalRating` needs to be a number, not a list. Your expression `totalRating = joinedDF3.groupBy().sum("Rating").collect()` returns a list as I suspected in my answer. You don't want that. You want to extract the number inside the list `totalRating` as suggested in this [answer](https://stackoverflow.com/questions/47812526/pyspark-sum-a-column-in-dataframe-and-return-results-as-int?rq=1) – VinceP Aug 08 '19 at 13:18
  • Your `totalRating` needs to be `totalRating = joinedDF3.groupBy().sum("Rating").collect()[0][0]` – VinceP Aug 08 '19 at 13:23
  • Cool, please accept my answer if you are happy with it – VinceP Aug 08 '19 at 13:46