π EvaluationΒΆ
π MetricsΒΆ
We choose to evaluate image on 2 factors : Pixels for similarity and segmentation for usability.
We will evaluate individually all nucleus together and all membrane together for a total of 4 metrics: N_SSIM_nucleus, N_SSIM_membrane, N_IOU_nucleus, N_IOU_membrane.
To do that we are using:
- Normalized Structural Similarity Index Measure (N_SSIM)
of predicted and ground truth 3D images.
We calculate the SSIM of the predicted images with the Ground Truth.
However, as the images to be predicted are very similar to the ground truth images, we decided to smooth out the tightly bunched scores by normalizing them with respect to a reference SSIM score between the input image and the ground truth image.
n_ssim = (prediction_ssim - reference_ssim) / (1 - reference_ssim)
Then, there are the N_SSIM_nucleus and the N_SSIM_membrane metrics.
- Segmentation: Normalized IOU of segmentation using cellpose
We performed verified segmentations over the ground-truth and input images. Then, we calculated the IOU between the input and the ground truth verified segmentations : the best_segmentation.
Moreover, we made the IOU between the input image verified segmentation and a cellpose segmentation of the ground truth image: the reference_segmentation.
During the evaluation process, a segmentation of the prediction image with Cellpose will be performed. Then, an IOU will be calculated between this predicted segmentation and the ground truth segmentation.
n_iou = (prediction_iou - reference_iou) / (best_iou - reference_iou)
We are using the Cellpose models:
- nuclei for nucleus
- cyto3 for membranes
(The images becomes isotropic by linear inference and the diameter is evaluated on ground truth images.)
Then, there are the N_IOU_nucleus and the N_IOU_membrane metrics.
ΒΆ
π Images normalizationΒΆ
We chose to use percentile normalization on the images when calculating the metrics. To this end, we calculated a percentile normalization with a minimum of 2 and a maximum of 99.8.
It is inspired by the CSBDeep Python package, which provides a toolbox for content-based fluorescence microscopy image restoration (CARE). https://github.com/CSBDeep/CSBDeep/blob/master/csbdeep/data/prepare.py#L5
One of the advantages of this normalization is that if you perform it twice on the same image (e.g. during training and we do during evaluation), the result is not changed (or it's totally negligible). This ensures that algorithms that use or do not use normalization during training are not favored or penalized.
def percentile_normalization(image, pmin=2, pmax=99.8, axis=None): """ Compute a percentile normalization for the given image. Parameters: - image (array): array (2D or 3D) of the image file. - pmin (int or float): the minimal percentage for the percentiles to compute. Values must be between 0 and 100 inclusive. - pmax (int or float): the maximal percentage for the percentiles to compute. Values must be between 0 and 100 inclusive. - axis : Axis or axes along which the percentiles are computed. The default (=None) is to compute it along a flattened version of the array. - dtype (dtype): type of the wanted percentiles (uint16 by default) Returns: Normalized image (np.ndarray): An array containing the normalized image. """ if not (np.isscalar(pmin) and np.isscalar(pmax) and 0 <= pmin < pmax <= 100 ): raise ValueError("Invalid values for pmin and pmax") low_percentile = np.percentile(image, pmin, axis = axis, keepdims = True) high_percentile = np.percentile(image, pmax, axis = axis, keepdims = True) if low_percentile == high_percentile: print(f"Same min {low_percentile} and high {high_percentile}, image may be empty") return image return (image - low_percentile) / (high_percentile - low_percentile)
β If needed, this function is written in the file tool.py of the Docker template.
ΒΆ
π RankingΒΆ
For each team, participant, or both, there will be
-
4 individual ranking: one for each metric.
-
One final overall rank: the mean of the relative individual ranks for each team.
We are going to evaluate all teams and participants, regardless of code type and model weight and availability, so as not to limit participation by companies even if they do not wish to distribute their code for intellectual property or commercial reasons.