0

The following piece of python code works well for finding gradient descent:

def gradientDescent(x, y, theta, alpha, m, numIterations):
    xTrans = x.transpose()
    for i in range(0, numIterations):
        hypothesis = np.dot(x, theta)
        loss = hypothesis - y 
        cost = np.sum(loss ** 2) / (2 * m)
        print("Iteration %d | Cost: %f" % (i, cost))
        gradient = np.dot(xTrans, loss) / m 
        theta = theta - alpha * gradient
    return theta

Here, x = m*n (m = no. of sample data and n = total features) feature matrix.

However, if my features are non-numerical (say, director and genre) of '2' movies then my feature matrix may look like:

['Peter Jackson', 'Action'
 Sergio Leone', 'Comedy']

In such a case, how can I map these features to numerical values and apply gradient descent ?

Saurabh Verma
  • 6,328
  • 12
  • 52
  • 84

1 Answers1

1

You can map your features to numerical value of your choice and then apply gradient descent the usual way.

In python you can use panda to do that easily:

import pandas as pd
df = pd.DataFrame(X, ['director', 'genre'])
df.director = df.director.map({'Peter Jackson': 0, 'Sergio Leone': 1})
df.genre = df.genre.map({'Action': 0, 'Comedy': 1})

As you can see, this way can become pretty complicated and it might be better to write a piece of code doing that dynamically.

MathiasDesch
  • 352
  • 3
  • 15
  • 1
    you took very simple example, when you have 3 possible values of categorical variable, you cannot (shouldnt) code them as "0", "1", "2" – lejlot Nov 10 '15 at 14:26
  • @lejlot Can you please suggest the correct method for this kind of problem ? – Saurabh Verma Nov 10 '15 at 14:40
  • 1
    Mathias approach is fine, simply his example might be missleading for the case with multiple values. Typical mapping is "one hot encoding" so for a feature with M possible values you add to representation M new dimensions, so genre e ['action', 'comedy', 'drama'] is now 3 new dimensions and if your movie is drama it gets 001, if a comedy 010, and so on. – lejlot Nov 10 '15 at 14:42
  • I agree that this is really simple solution and is not really suitable for higher dimensions. – MathiasDesch Nov 12 '15 at 11:16