106

How to get merged data frame from two data frames having common column value such that only those rows make merged data frame having common value in a particular column.

I have 5000 rows of df1 as format : -

    director_name   actor_1_name    actor_2_name    actor_3_name    movie_title
0   James Cameron   CCH Pounder Joel David Moore    Wes Studi     Avatar
1   Gore Verbinski  Johnny Depp Orlando Bloom   Jack Davenport   Pirates 
    of the Caribbean: At World's End
2   Sam Mendes   Christoph Waltz    Rory Kinnear    Stephanie Sigman Spectre

and 10000 rows of df2 as

movieId                   genres                        movie_title
    1       Adventure|Animation|Children|Comedy|Fantasy   Toy Story
    2       Adventure|Children|Fantasy                    Jumanji
    3       Comedy|Romance                             Grumpier Old Men
    4       Comedy|Drama|Romance                      Waiting to Exhale

A common column 'movie_title' have common values and based on them, I want to get all rows where 'movie_title' is same. Other rows to be deleted.

Any help/suggestion would be appreciated.

Note: I already tried

pd.merge(dfinal, df1, on='movie_title')

and output comes like one row

director_name   actor_1_name    actor_2_name    actor_3_name    movie_title movieId title   genres

and on how ="outer"/"left", "right", I tried all and didn't get any row after dropping NaN although many common coloumn do exist.

Hari_pb
  • 7,088
  • 3
  • 45
  • 53

3 Answers3

124

You can use pd.merge:

import pandas as pd
pd.merge(df1, df2, on="movie_title")

Only rows are kept for which common keys are found in both data frames. In case you want to keep all rows from the left data frame and only add values from df2 where a matching key is available, you can use how="left":

pd.merge(df1, df2, on="movie_title", how="left")
BLimitless
  • 2,060
  • 5
  • 17
  • 32
pansen
  • 6,433
  • 4
  • 19
  • 32
  • It doesn't work comes like only one row director_name actor_1_name actor_2_name actor_3_name movie_title movieId title genres – Hari_pb Apr 08 '17 at 16:41
  • 3
    @Harry_pb, that means you have only one matching `movie_title`. Merge will work properly only if the strings are __exactly__ the same. – MaxU - stand with Ukraine Apr 08 '17 at 16:49
  • 2
    @Harry_pb, how did you resolve this problem at last? I'm running into the same issue: left df and right df obviously have the same common column, but the merged df is always empty! – Shiang Hoo Feb 21 '19 at 13:21
  • 3
    I figured it out just after the previous comment... the common column in my 2 DFs looks the same, just numbers, but in fact, one DF stores the numbers in int64 but the other one stores them in object!! Driven me crazy totally!! – Shiang Hoo Feb 21 '19 at 13:25
84

We can merge two Data frames in several ways. Most common way in python is using merge operation in Pandas.

import pandas
dfinal = df1.merge(df2, on="movie_title", how = 'inner')

For merging based on columns of different dataframe, you may specify left and right common column names specially in case of ambiguity of two different names of same column, lets say - 'movie_title' as 'movie_name'.

dfinal = df1.merge(df2, how='inner', left_on='movie_title', right_on='movie_name')

If you want to be even more specific, you may read the documentation of pandas merge operation.

IamAshay
  • 1,377
  • 1
  • 7
  • 16
Hari_pb
  • 7,088
  • 3
  • 45
  • 53
10

If you want to merge two DataFrames and you want a merged DataFrame in which only common values from both data frames will appear then do inner merge.

import pandas as pd

merged_Frame = pd.merge(df1, df2, on = id, how='inner')
sokeefe
  • 627
  • 12
  • 33
gaurav singh
  • 109
  • 1
  • 4