-1

All credits to the Kaggle Course on pandas here the dataset.head() : enter image description here

here the task: We'd like to host these wine reviews on our website, but a rating system ranging from 80 to 100 points is too hard to understand - we'd like to translate them into simple star ratings. A score of 95 or higher counts as 3 stars, a score of at least 85 but less than 95 is 2 stars. Any other score is 1 star.

Also, the Canadian Vintners Association bought a lot of ads on the site, so any wines from Canada should automatically get 3 stars, regardless of points.

Create a series star_ratings with the number of stars corresponding to each review in the dataset.

so here the solution :

def stars(row):
if row.country == 'Canada':
    return 3
elif row.points >= 95:
    return 3
elif row.points >= 85:
    return 2
else:
    return 1

star_ratings = reviews.apply(stars, axis='columns')

so here my question : I wonder when this solution perform applying functions on each row. is it checking the condition for every column in the row and applying on all of them cause it doesn't specify to perform only on the 'points' column

lakuzama
  • 1
  • 1

2 Answers2

1

There are multiple conditions. One applies on every row of the "country" column, while the other two are on the "points" column. The "alternative result" with else is if no conditions are met. With that said, it is better practice for pandas to use np.select, so that your solution is highly vectorized (faster run time):

import numpy as np
star_ratings = np.select([(row.country == 'Canada') | (row.points >= 95), (row.points >= 85)], #condiitons
                         [3,3], #results
                          1)    #alternative result (like your else)

The three parameter arguments are conditions (list of all conditions), Results (list of results in order of conditions), and alternative result. More here on numpy.select.

David Erickson
  • 16,433
  • 2
  • 19
  • 35
1

Creating a sample dataframe and then using the same function that you have created, you would only need to do an .apply() to get the required result.

Note: This is a sample dataset, you can use your own wine dataset instead of creating it in the second line of the code.

import pandas as pd
wine = pd.DataFrame({"country": ["Canada", "US", "Aus"], "points": [85,99,45]})

def stars(row):
    if row.country == 'Canada':
       return 3
    elif row.points >= 95:
       return 3
    elif row.points >= 85:
       return 2
    else:
       return 1

wine["stars"] = wine.apply(lambda x: stars(x), axis = 1)

Explanation: .apply() function applies any given function to each row/column of a pandas dataframe. Since here we want to apply to each row, we give an additional paramater, axis = 1, axis is by default set to 0 (column-wise)

David Erickson
  • 16,433
  • 2
  • 19
  • 35
srishtigarg
  • 1,106
  • 10
  • 24