python - How to compute precision, recall, accuracy and f1-score for the multiclass case with scikit learn? -


i'm working in sentiment analysis problem data looks this:

label instances     5    1190     4     838     3     239     1     204     2     127 

so data unbalanced since 1190 instances labeled 5. classification im using scikit's svc. problem not know how balance data in right way in order compute accurately precision, recall, accuracy , f1-score multiclass case. tried following approaches:

first:

    wclf = svc(kernel='linear', c= 1, class_weight={1: 10})     wclf.fit(x, y)     weighted_prediction = wclf.predict(x_test)  print 'accuracy:', accuracy_score(y_test, weighted_prediction) print 'f1 score:', f1_score(y_test, weighted_prediction,average='weighted') print 'recall:', recall_score(y_test, weighted_prediction,                               average='weighted') print 'precision:', precision_score(y_test, weighted_prediction,                                     average='weighted') print '\n clasification report:\n', classification_report(y_test, weighted_prediction) print '\n confussion matrix:\n',confusion_matrix(y_test, weighted_prediction) 

second:

auto_wclf = svc(kernel='linear', c= 1, class_weight='auto') auto_wclf.fit(x, y) auto_weighted_prediction = auto_wclf.predict(x_test)  print 'accuracy:', accuracy_score(y_test, auto_weighted_prediction)  print 'f1 score:', f1_score(y_test, auto_weighted_prediction,                             average='weighted')  print 'recall:', recall_score(y_test, auto_weighted_prediction,                               average='weighted')  print 'precision:', precision_score(y_test, auto_weighted_prediction,                                     average='weighted')  print '\n clasification report:\n', classification_report(y_test,auto_weighted_prediction)  print '\n confussion matrix:\n',confusion_matrix(y_test, auto_weighted_prediction) 

third:

clf = svc(kernel='linear', c= 1) clf.fit(x, y) prediction = clf.predict(x_test)   sklearn.metrics import precision_score, \     recall_score, confusion_matrix, classification_report, \     accuracy_score, f1_score  print 'accuracy:', accuracy_score(y_test, prediction) print 'f1 score:', f1_score(y_test, prediction) print 'recall:', recall_score(y_test, prediction) print 'precision:', precision_score(y_test, prediction) print '\n clasification report:\n', classification_report(y_test,prediction) print '\n confussion matrix:\n',confusion_matrix(y_test, prediction)   f1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: deprecationwarning: default `weighted` averaging deprecated, , version 0.18, use of precision, recall or f-score multiclass or multilabel data or pos_label=none result in exception. please set explicit value `average`, 1 of (none, 'micro', 'macro', 'weighted', 'samples'). in cross validation use, instance, scoring="f1_weighted" instead of scoring="f1".   sample_weight=sample_weight) /usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172: deprecationwarning: default `weighted` averaging deprecated, , version 0.18, use of precision, recall or f-score multiclass or multilabel data or pos_label=none result in exception. please set explicit value `average`, 1 of (none, 'micro', 'macro', 'weighted', 'samples'). in cross validation use, instance, scoring="f1_weighted" instead of scoring="f1".   sample_weight=sample_weight) /usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1082: deprecationwarning: default `weighted` averaging deprecated, , version 0.18, use of precision, recall or f-score multiclass or multilabel data or pos_label=none result in exception. please set explicit value `average`, 1 of (none, 'micro', 'macro', 'weighted', 'samples'). in cross validation use, instance, scoring="f1_weighted" instead of scoring="f1".   sample_weight=sample_weight)  0.930416613529 

however, im getting warnings this:

/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:1172: deprecationwarning: default `weighted` averaging deprecated, , version 0.18, use of precision, recall or f-score  multiclass or multilabel data or pos_label=none result in  exception. please set explicit value `average`, 1 of (none,  'micro', 'macro', 'weighted', 'samples'). in cross validation use,  instance, scoring="f1_weighted" instead of scoring="f1" 

how can deal correctly unbalanced data in order compute in right way classifier's metrics?

i think there lot of confusion weights used what. not sure know precisely bothers going cover different topics, bear me ;).

class weights

the weights class_weight parameter used train classifier. are not used in calculation of of metrics using: different class weights, numbers different because classifier different.

basically in every scikit-learn classifier, class weights used tell model how important class is. means during training, classifier make efforts classify classes high weights.
how algorithm-specific. if want details how works svc , doc not make sense you, feel free mention it.

the metrics

once have classifier, want know how performing. here can use metrics mentioned: accuracy, recall_score, f1_score...

usually when class distribution unbalanced, accuracy considered poor choice gives high scores models predict frequent class.

i not detail these metrics note that, exception of accuracy, naturally applied @ class level: can see in print of classification report defined each class. rely on concepts such true positives or false negative require defining class positive one.

             precision    recall  f1-score   support            0       0.65      1.00      0.79        17           1       0.57      0.75      0.65        16           2       0.33      0.06      0.10        17 avg / total       0.52      0.60      0.51        50 

the warning

f1 score:/usr/local/lib/python2.7/site-packages/sklearn/metrics/classification.py:676: deprecationwarning:  default `weighted` averaging deprecated, , version 0.18,  use of precision, recall or f-score multiclass or multilabel data   or pos_label=none result in exception. please set explicit  value `average`, 1 of (none, 'micro', 'macro', 'weighted',  'samples'). in cross validation use, instance,  scoring="f1_weighted" instead of scoring="f1". 

you warning because using f1-score, recall , precision without defining how should computed! question rephrased: above classification report, how output one global number f1-score? could:

  1. take average of f1-score each class: that's avg / total result above. it's called macro averaging.
  2. compute f1-score using global count of true positives / false negatives, etc. (you sum number of true positives / false negatives each class). aka micro averaging.
  3. compute weighted average of f1-score. using 'weighted' in scikit-learn weigh f1-score support of class: more elements class has, more important f1-score class in computation.

these 3 of options in scikit-learn, warning there have pick one. have specify average argument score method.

which 1 choose how want measure performance of classifier: instance macro-averaging not take class imbalance account , f1-score of class 1 important f1-score of class 5. if use weighted averaging you'll more importance class 5.

the whole argument specification in these metrics not super-clear in scikit-learn right now, better in version 0.18 according docs. removing non-obvious standard behavior , issuing warnings developers notice it.

computing scores

last thing want mention (feel free skip if you're aware of it) scores meaningful if computed on data classifier has never seen. extremely important score on data used in fitting classifier irrelevant.

here's way using stratifiedshufflesplit, gives random splits of data (after shuffling) preserve label distribution.

from sklearn.datasets import make_classification sklearn.cross_validation import stratifiedshufflesplit sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score, classification_report, confusion_matrix  # use utility generate artificial classification data. x, y = make_classification(n_samples=100, n_informative=10, n_classes=3) sss = stratifiedshufflesplit(y, n_iter=1, test_size=0.5, random_state=0) train_idx, test_idx in sss:     x_train, x_test, y_train, y_test = x[train_idx], x[test_idx], y[train_idx], y[test_idx]     svc.fit(x_train, y_train)     y_pred = svc.predict(x_test)     print(f1_score(y_test, y_pred, average="macro"))     print(precision_score(y_test, y_pred, average="macro"))     print(recall_score(y_test, y_pred, average="macro"))     

hope helps.


Comments