I learned join methods in sql, and I know that inner join means returning only the intersections of the two different tables that we want to set.
I thought for python the concept is same. But I have problem understanding the certain code.
crsp1=pd.merge(crsp, crsp_maxme, how='inner', on=['jdate','permco','me'])
crsp1=crsp1.drop(['me'], axis=1)
crsp2=pd.merge(crsp1, crsp_summe, how='inner', on=['jdate','permco'])
If I understood correctly, the first line merges table crsp and crsp_maxme with intersection on column 'jdate', 'permco', 'me'. So the table crsp1 would have 3 columns. The second line drops the 'me' column of table crsp1. The last lien would merge newly adjusted table crsp1 and crsp_summe with inner join, with intersection on 'jdate' and 'permco'. Which makes newly merged table crsp2 only having 2 columns.
However, the code explanation from line 2 says that the second and third lines drop 'me' column from crsp1 and then replace it with 'me' from crsp_summe table, which I had problem understanding.
Could anyone clarify these lines for me?
PS: I thought it isn't necessary to explain what the table crsp, crsp_summe, and crsp_maxme since they are all framed by inner join function. So please excuse the lack of background info.