How do I write a function that takes one row and returns a list of 2-dimension tuples

Question

So I am working on this dataset.

I wanted to take one row and returns with 2-dimension tuples. For example, for row 0, it returns: [('Action', 7.9), ('Adventure', 7.9), ('Fantasy', 7.9), ('Sci-Fi', 7.9)]. So that every genre from the movie will be the same imdb score.

This is from a school project and I can't think of a way that this could be done. Can anyone help me?

Im sorry, for the lack of details in this question, I will try to lay out all the details now.

The dataset is movie_metadata.csv. I cant seem to attach the file here.

After i got the function I am supposed to apply the function to all the rows until i have a one list containing all 2-dimensional tuples. Then i would have to convert the list of tuples into a dataframe. Ideally, I want to create a new data set named 'genre_score' that has two columns: genre, and imdb_score. Each row will have only one genre and the IMDB rating of the movie from that genre.Then i would have to calculate the mean IMDB rating per genre and make the following graph.

I can probably figure something out with everything else except the function. Writing the function is the struggle for me.

Can you post text data? Because cannot copy data from picture. — jezrael, Apr 11 '20 at 11:21

jezrael · Accepted Answer · 2020-04-12T05:11:08.597

Use list comprehension with flatten values splitted by |:

df = pd.DataFrame({'genres':['Action|Adventure|Fantasy|Sci-Fi','Action|Adventure|Fantasy'],
                   'imdb_score':[7.9,7.1]})
print (df)
                            genres  imdb_score
0  Action|Adventure|Fantasy|Sci-Fi         7.9
1         Action|Adventure|Fantasy         7.1

row = 0
L = [(x, i) for g,i in df.loc[[row], ['genres','imdb_score']].values for x in g.split('|')]
print (L)
[('Action', 7.9), ('Adventure', 7.9), ('Fantasy', 7.9), ('Sci-Fi', 7.9)]

EDIT: Use Series.str.get_dummies for indicator columns, multiple by DataFrame.mul, replace 0 to missing values and get means, last convert Series to DataFrame by Series.rename_axis and Series.reset_index:

df1 = (df['genres'].str.get_dummies()
                   .replace(0, np.nan)
                   .mul(df['imdb_score'], axis=0)
                   .mean()
                   .rename_axis('genres')
                   .reset_index(name='imdb_score'))
print (df1)
      genres  imdb_score
0     Action         7.5
1  Adventure         7.5
2    Fantasy         7.5
3     Sci-Fi         7.9

Another solution is use Series.str.split for lists and DataFrame.explode, last aggregate mean:

df1 = (df.assign(genres=df['genres'].str.split('|'))
         .explode('genres')
         .groupby('genres', as_index=False)['imdb_score']
         .mean())
print (df1)
      genres  imdb_score
0     Action         7.5
1  Adventure         7.5
2    Fantasy         7.5
3     Sci-Fi         7.9

Sorry to bother you again man, but do you mind explaining to me how do u get the list only before we find the mean for each genre. Just so that I can see how many movies are listed for each genres. I would like to eliminate genres that have less than 10 movies so that the data is more accurate. — Carlo Silanu, Apr 12 '20 at 14:33
@CarloSilanu - So need [this](https://stackoverflow.com/q/29836836)? Like `df1 = (df.assign(genres=df['genres'].str.split('|')) .explode('genres'))` then applies solution from link and then `df1 = df1.groupby('genres', as_index=False)['imdb_score'] .mean())` ? — jezrael, Apr 13 '20 at 06:44

score 0 · Answer 2 · answered Apr 11 '20 at 11:23

0

Try this :

array = [ (col,val) for col,val in dataframe.iloc[row_num].items() ]
print(array)

answered Apr 11 '20 at 11:23

AmirHmZ

516
3
22

score 0 · Answer 3 · answered Apr 11 '20 at 11:24

0

You can use Dictionary inside a Dictionary

dataset = {'R1':{'C1':'V1','C2':'V2','C3':'V3'},
'R2':{'C1':'V1','C2':'V2','C3':'V3'},
'R3':{'C1':'V1','C2':'V2','C3':'V3'}
}

answered Apr 11 '20 at 11:24

Chitkaran Singh

1,466
2
10
18

score 0 · Answer 4 · answered Apr 11 '20 at 11:33

U can make ur function like this

def myFunction(row):
    row += 1
    // Your list
    mylist = [
        // first row
        [
            ('genres', 'Action|Adventure|Fantasy|Sci-Fi'),
            ('num_user_for_reviews', 3054.0)],
        ],
        // second row
        [
            ('genres', 'Action|Adventure|Fantasy'),
            ('num_user_for_reviews', 1238.0)]
        ]
    return myList[row]

Then call the function and fill with row u want

// return firstrow
muFunction(1)

How do I write a function that takes one row and returns a list of 2-dimension tuples

4 Answers4