4

I have two pandas dataframes with some columns in common. These columns are of type category but unfortunately the category codes don't match for the two dataframes. For example I have:

>>> df1
     artist          song
0  The Killers   Mr Brightside
1  David Guetta  Memories
2  Estelle       Come Over
3  The Killers   Human


>>> df2  
     artist      date
0  The Killers   2010
1  David Guetta  2012
2  Estelle       2005
3  The Killers   2006

But:

>>> df1['artist'].cat.codes
0           55
1           78
2           93
3           55

Whereas:

>>> df2['artist'].cat.codes
0           99
1           12
2           23
3           99

What I would like is for my second dataframe df2 to take the same category codes as the first one df1 without changing the category values. Is there any way to do this?

(Edit)

Here is a screenshot of my two dataframes. Essentially I want the song_tags to have the same cat codes for artist_name and track_name as the songs dataframe. Also song_tags is created from a merge between songs and another tag dataframe (which contains song data and their tags, without the user information) and then saved and loaded through pickle. Also it might be relevant to add that I had to cast artist_name and track_name in song_tags to type category from type object.

Dataframes

I think essentially my question is: how to modify category codes of an existing dataframe column?

beepboop
  • 41
  • 4
  • Does this answer your question? https://stackoverflow.com/questions/61408095/use-same-category-labeling-criteria-on-two-different-dataframes – sandertjuh May 16 '21 at 09:48
  • It will be helpful if you can show all columns for both dataframes. you can add that in your original question.. You can do that by either by using `df1.columns` and `df2.columns` or you can give a snapshot of your data by using `df.head() ` and `df2.head()` – Deepak May 16 '21 at 09:48
  • @sandertjuh I tried this but I didn't manage to make it work with my dataframes. It seems that the answer only uses one dataframe so I'm not sure how this applied to my case. – beepboop May 16 '21 at 09:59
  • @Deepak I can do that, but the dataframes were more examples to illustrate my problem. I'm not sure I understand why showing all columns would help with this issue. – beepboop May 16 '21 at 09:59
  • @beepboop I am just trying to understand how you are getting those values from this `df2['artist'].cat.codes` , more importantly I am more interested on `artist` column of your dataframe – Deepak May 16 '21 at 13:01
  • @Deepak I think the difference may be linked to the fact that for df2 I cast `artist` from an `object` column to a `category` column. I think essentially my question is: can we modify category codes of an existing dataframe column? and if so, how? As from the link above I can create a mapping from artist to the desired category code but I can't find how to apply it to the current category codes. – beepboop May 16 '21 at 13:13

0 Answers0