Performance evaluation¶

Each algorithm will be evaluated in terms oc classification accuracy in classifying the images into one of the four histology classes (normal, villous atrophy, crypt htypertrophy, villous atrophy and crypt hypertophy). For the most representative score evaluation, two performance metrics will be evaluated:

a leave-one-out cross-validation must be performed at the site level on the training set, which we call Leave-One-Site-Out Cross-Validation (LOSOCV). Hence, the algorithm should be evaluated for all sites separately, where the images of all the other sites are used for training. In this fashion, all possible bias due to images of the same imaging site appearing in both the training and the test set is ruled out.

- the accuracy in classifying the images in the test set must be evaluated. For estimation of algorithm-related parameters, only the training data can be used. Hence, in the case of parameter-estimation algorithms, the parameters must be evaluated for each patient separately, as if the test data is not available. It is not allowed to estimate algorithm parameters based on the complete set of images before cross-validation.