its first time using scikit learn metrics , want graph roc curve using library.
this roc curve says auc=1.00 know incorrect. here code:
from sklearn.metrics import roc_curve, auc import pylab pl def show_roc(test_target, predicted_probs): # set number 1 actual = [1, -1, -1, -1, -1, 1, -1, -1, 1, -1, -1, -1, -1, -1, -1, -1, 1, -1, -1, -1] prediction_probas = [0.374, 0.145, 0.263, 0.129, 0.215, 0.538, 0.24, 0.183, 0.402, 0.2, 0.281, 0.277, 0.222, 0.204, 0.193, 0.171, 0.401, 0.204, 0.213, 0.182] fpr, tpr, thresholds = roc_curve(actual, prediction_probas) roc_auc = auc(fpr, tpr) # plot roc curve pl.clf() pl.plot(fpr, tpr, label='roc curve (area = %0.2f)' % roc_auc) pl.plot([0, 1], [0, 1], 'k--') pl.xlim([-0.1, 1.2]) pl.ylim([-0.1, 1.2]) pl.xlabel('false positive rate') pl.ylabel('true positive rate') pl.title('receiver operating characteristic example') pl.legend(loc="lower right") pl.show()
for first set, here graph: http://i.stack.imgur.com/pa93c.png
the probabilities low, positives, don't know why displays perfect roc graph these inputs.
# set number 2 actual = [1,1,1,0,0,0] prediction_probas = [0.9,0.9,0.1,0.1,0.1,0.1] fpr, tpr, thresholds = roc_curve(actual, prediction_probas) roc_auc = auc(fpr, tpr) # plot roc curve pl.clf() pl.plot(fpr, tpr, label='roc curve (area = %0.2f)' % roc_auc) pl.plot([0, 1], [0, 1], 'k--') pl.xlim([-0.1, 1.2]) pl.ylim([-0.1, 1.2]) pl.xlabel('false positive rate') pl.ylabel('true positive rate') pl.title('receiver operating characteristic example') pl.legend(loc="lower right") pl.show()
for second set here graph output:
this 1 seems more reasonable, , included comparison.
i have read through scikit learn documentation pretty day , stumped.
you getting perfect curve because labels aka actual
line prediction scores aka prediction_probas
. though tp scores low, there still distinguishable boundary between the 1s , -1s translates them being in acceptable thresholds classifications.
try changing 1 of higher scored 1s -1, or of -1s 1 , see resulting curve