0

From this roc function How can I get the AUC(area under curve) value?

Here is the code i used:

def roc(data_set):
    normal = 0
    data_set_size = data_set.shape[1]
    roc_rate = np.zeros((2, data_set_size))
    for i in range(data_set_size):
        if data_set[2][i] == 1:
            normal += 1
    abnormal = data_set_size - normal
    max_dis = data_set[1].max()
    for j in range(1000):
        threshold = max_dis / 1000 * j
        normal1 = 0
        abnormal1 = 0
        for k in range(data_set_size):
            if data_set[1][k] > threshold and data_set[2][k] == 1:
                normal1 += 1
            if data_set[1][k] > threshold and data_set[2][k] == 2:
                abnormal1 += 1
        roc_rate[0][j] = normal1 / normal  # true positive
        roc_rate[1][j] = abnormal1 / abnormal  # false positive
    return roc_rate

and Accuracy of model is the ratio of AUC/TotalArea ? Right ??

Thanks in advance.

제갈묘
  • 53
  • 2
  • 3
  • 7

2 Answers2

0

It is cool that you're working this out from first principles, have always been a bit curious about that ROC curve but up till now just used ready made function.

Assume from the code you have given that

x = roc_rate[0]
y = roc_rate[1]

Then use this solution to integrate it numerically:

import scipy
scipy.integrate.simps(y,x)

Maybe I haven't properly understood your code - would be easier with samples of its input and output, but if you play with this numerical integration you will get the area.

Yes, I think the ROC curve fits on a 1 by 1 axis, if the model has no classification power at all, then the 'curve' is just the diagonal with an area of 1/2 under it, getting the right answer just by chance. The perfect model has a ROC curve which is an upside down L, taking in the whole area of 1, and real world models are curves that arc between the diagonal and the upside down L.

cardamom
  • 6,873
  • 11
  • 48
  • 102
  • you need to check the data you are putting into it makes sense but also have a look [here](https://stackoverflow.com/questions/13497891/python-getting-around-division-by-zero) and [here](https://stackoverflow.com/questions/20161899/scipy-curve-fit-error-divide-by-zero-encountered) – cardamom Jul 12 '17 at 10:11
0

When scipy.integrate.simps(y,x)

I have met error like this

/usr/local/lib/python2.7/site-packages/scipy/integrate/quadrature.py:324: RuntimeWarning: divide by zero encountered in true_divide h0divh1 = h0 / h1 /usr/local/lib/python2.7/site-packages/scipy/integrate/quadrature.py:324: RuntimeWarning: invalid value encountered in true_divide h0divh1 = h0 / h1 /usr/local/lib/python2.7/site-packages/scipy/integrate/quadrature.py:326: RuntimeWarning: divide by zero encountered in true_divide y[slice1]hsumhsum/hprod + /usr/local/lib/python2.7/site-packages/scipy/integrate/quadrature.py:326: RuntimeWarning: invalid value encountered in true_divide y[slice1]hsumhsum/hprod + /usr/local/lib/python2.7/site-packages/scipy/integrate/quadrature.py:326: RuntimeWarning: invalid value encountered in add y[slice1]hsumhsum/hprod + /usr/local/lib/python2.7/site-packages/scipy/integrate/quadrature.py:327: RuntimeWarning: invalid value encountered in add y[slice2]*(2-h0divh1))

nan

How Can I fix it?

제갈묘
  • 53
  • 2
  • 3
  • 7