1

I'm trying to use logistic regression on the popularity of hits songs on Spotify from 2010-2019 based on their durations and durability, whose data are collected from an .csv file. Basically, since the popularity values of each song is numerical, I have converted each of them to binary numbers "0" to "1". If the popularity value of a hit song is less than 70, I will replace its current value to 0, and vice versa if its value is more than 70.

The current sigmoid curve is being "log" right now, hence it is showing a straight line. However, in the context of this code, I am still not sure how to add in a proper sigmoid curve, instead of just the straight line. Is there anything i need to add to my code in order to show both a solid sigmoid curve and the log of the curve in the same graph? It would be deeply appreciated if someone can help me with the final step.

 %matplotlib inline
 import numpy as np
 import matplotlib.pyplot as plt 
 import pandas as pd

 df = pd.read_csv('top10s [SubtitleTools.com] (2).csv')

 BPM = df.bpm
 BPM = np.array(BPM)
 Energy = df.nrgy
 Energy = np.array(Energy)
 Dance = df.dnce
 Dance = np.array(Dance)
 dB = df.dB
 dB = np.array(dB)
 Live = df.live
 Live = np.array(Live)
 Valence = df.val
 Valence = np.array(Valence)
 Acous = df.acous
 Acous = np.array(Acous)
 Speech = df.spch
 Speech = np.array(Speech)

 df.loc[df['popu'] <= 70, 'popu'] = 0

 df.loc[df['popu'] > 70, 'popu'] = 1

 def Logistic_Regression(X, y, iterations, alpha):
   ones = np.ones((X.shape[0], ))
   X = np.vstack((ones, X))
   X = X.T
   b = np.zeros(X.shape[1])

   for i in range(iterations):
     z = np.dot(X, b)
     p_hat = sigmoid(z)
     gradient = np.dot(X.T, (y - p_hat))/y.size
     b = b + alpha * gradient
     if (i % 1000 == 0):
       print('LL, i ', log_likelihood(X, y, b), i)
   return b

 def sigmoid(z):
   return 1 / (1 + np.exp(-z))

 def log_likelihood(X, y, b):
   z = np.dot(X, b)
   LL = np.sum(y*z - np.log(1 + np.exp(z)))
   return LL

 def LR1():
   Dur = df.dur
   Dur = np.array(Dur)
   Pop = df.popu

   Pop = [int(i) for i in Pop]; Pop = np.array(Pop)


   plt.figure(figsize=(10,8))
   colormap = np.array(['r', 'b'])
   plt.scatter(Dur, Pop, c = colormap[Pop], alpha = .4)
   b = Logistic_Regression(Dur, Pop, iterations = 8000, alpha = 0.00005)
   print('Done')

   p_hat = sigmoid(np.dot(Dur, b[1]) + b[0])
   idxDur = np.argsort(Dur)
   plt.plot(Dur[idxDur], p_hat[idxDur])
   plt.show()

 LR1()

My dataset:

CSV File

My Current Graph

What i want to have:

Shape of sigmoid i want

Brendan
  • 19
  • 4

2 Answers2

0

at first glance, your Logistic_Regression initialization seems very wrong.

I think you packed X with [X, 1] then tries to learn W = [Weight, bias], which should be [1, 0] to start with.

Note the 1 is vector [1, 1, 1...] with length = feature vector length.

Qianyi Zhang
  • 184
  • 6
0

try something like this:

x_range = np.linspace(Dur.min(), Dur.max(), 100)

p_hat = sigmoid(np.dot(x_range, b[1]), b[0])

plt.plot(x_range, p_hat)

plt.show()

theletz
  • 1,713
  • 2
  • 16
  • 22