The Fourth Week

The goal for the week was to begin the implementation of the metrics designed in order of priority and importance. Each implementation was committed onto a branch of a Github repository that had been forked from the HPCC Systems ML_Core repository.

The simple metrics such as Hamming Loss and F-Score were completed first, their implementations included within existing functions (Accuracy and Accuracy By Class functions). As for the more complicated metrics such as AUC and Silhouette coefficient, prototypes were first drawn up in order to test the code and their results, before being integrated into the library and committed to Github. Certain clarifications however, were required in completing the implementations of AUC and silhouette coefficient.

The implementation of silhouette coefficient in the ML_Core bundle requires that the type definitions needed by the function be present in ML_Core as well. Currently however, these type definitions are present in the KMeans bundle, and might have to be moved to the ML_Core bundle.

AUC (Area Under the ROC Curve) score is a metric that is used in the evaluation of binary predictions. In order to accommodate multi-label classification however, the predictions may be arranged in a one – vs – all fashion and the AUC score for each label can be calculated. This approach requires that the function be provided with the probabilities of occurrence of each label within the multi-label system for every result, which a lot of the classifiers do not provide.

The goal for the next week is to continue implementing the evaluation metrics upon clarifying the above issues.

Leave a comment