I am currently doing a project where my team and I have to pick a dataset and apply some machine learning methods on it (SLR, MLR etc.), hence for me, I am doing logical regression. My dataset is related to the top hit songs on Spotify from 2010-2019, and I want to see how the duration and danceability of a song affects its popularity. Given that the popularity values is numerical, I have converted the popularity value of each song to binary values. Hence, the popularity value of a song will change to "0" if it is below 65, and "1" if it is above the value of 65. I then decided to plot a 2d logistic regression plot for two dimensions. The end result is that both the "0" and "1" values are all gathered in the same area, where they are supposed to be separated from each other and there should be a decision boundary at .5 showing. I just want to know what does this show about the relationship between the popularity of the songs and their duration and danceability respectively. Is this supposed to be normal or did i make a mistake?
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.linear_model import LogisticRegression
df = pd.read_csv('top10s [SubtitleTools.com] (2).csv')
BPM = df.bpm
BPM = np.array(BPM)
Energy = df.nrgy
Energy = np.array(Energy)
dB = df.dB
dB = np.array(dB)
Live = df.live
Live = np.array(Live)
Valence = df.val
Valence = np.array(Valence)
Acous = df.acous
Acous = np.array(Acous)
Speech = df.spch
Speech = np.array(Speech)
def LogReg0732():
Dur = df.dur
Dur = np.array(Dur)
Dance = df.dnce
Dance = np.array(Dance)
Pop = df.popu
df.loc[df['popu'] <= 65, 'popu'] = 0
df.loc[df['popu'] > 65, 'popu'] = 1
Pop = np.array(Pop)
X = Dur
X = np.stack((X, Dance))
y = Pop
clf = LogisticRegression().fit(X.T, y)
print("Coef ", clf.intercept_, clf.coef_)
xx, yy = np.mgrid[np.min(Dur):np.max(Dur), np.min(Dance):np.max(Dance)]
gridxy = np.c_[xx.ravel(), yy.ravel()]
probs = clf.predict_proba(gridxy)[:,1].reshape(xx.shape)
f, ax = plt.subplots(figsize=(20,8))
contour = ax.contourf(xx, yy, probs, 25, cmap="BrBG", vmin=0, vmax=1)
ax_c = f.colorbar(contour)
ax_c.set_ticks([0, 1/4, 1/2, 3/4, 1])
idx = np.where(y==1); idx = np.reshape(idx,np.shape(idx)[1])
y1 = X[:,idx]
idx = np.where(y==0); idx = np.reshape(idx,np.shape(idx)[1])
y0 = X[:,idx]
ax.scatter(y1[0,:], y1[1,:], c='green')
ax.scatter(y0[0,:], y0[1,:], c='blue')
plt.xlabel('Danceability')
plt.ylabel('Duration')
plt.savefig('LR1.svg')
plt.show()
LogReg0732()