I'm new to data science and am currently learning different techniques that I can do with Python. Currently, I'm trying it out with Spotify's API for my own playlists.
The goal is to find the most dissimilar features between two different playlist.
My question is what is the best way to identify the most dissimilar features between these two playlists?
I started off by getting all the tracks in each playlist and their respective features. I then computed the mean of each of the features.
Here is the DataFrame I ended up with. The data values are the means of all the tracks features to their respective playlist
playlist1 playlist2
--------------------
danceability | 0.667509 0.592140
energy | 0.598873 0.468020
acousticness | 0.114511 0.398372
valence | 0.376920 0.287250
instrumentalness | 0.005238 0.227783
speechiness | 0.243587 0.088612
I did some digging and found two common procedures:
1. Euclidean Distance
2. Cosine Similarity
I for some reason couldn't wrap my head around which one to use and proceeded to compute the absolute difference between each feature. Simple subtraction because that made sense to me intuitively. The feature with the greatest difference would be the 'most dissimilar'.
With this approach, I ended up using these results and concluded that energy and acousticness are the most dissimilar
playlist1 playlist2 absoluteDifference
----------------------------------------------------
energy |0.871310 0.468020 0.403290
acousticness |0.041479 0.398372 0.356893
valence |0.501890 0.287250 0.214640
instrumentalness |0.049012 0.227783 0.178771
danceability |0.531071 0.592140 0.061069
speechiness |0.109587 0.088612 0.020975
Is my intuition correct/incorrect and when would we use the aforementioned techniques? Would any of those techniques be applicable in a situation such as this?
Eventually, I want to take the top two dissimilarities and make them my axis for KNN. My intuition is that I can identify the most dissimilar features of two playlists, I'll have a cleaner and more defined features of the playlist and can more accurately predict which song a playlist ought to belong to.