LRP for semantic segmentation
Layerwise relevance propagation
Layerwise relevance propagation allows to obtain a pixelwise decomposition of a~model decision and is applicable for most stateoftheart architectures for image classification and neural reinforcement learning^{1}. LRP uses a notion of relevance \(R\) which is equal to the model output for the output neurons and can be propagated to lower layers.
LRP for image classification
LRP can be employed to explain classification decisions for a given class \(i\) by relevance propagation from the corresponding model output $y^i$ according to:
\[y^i = R = \ldots = \sum_{d \in L^l} R_d^{(l)} = \ldots = \sum_{d \in L^1} R_d^{(1)} = \sum M^i \ldots (1)\]where \(l\) refers to the layer index, \(L^l\) to all neurons of layer \(l\), and \(R_d^{(l)}\) to a relevance of neuron \(d\) in layer \(l\). Typically, the relevances are propagated to the input layer (\(l=1\)) yielding a relevance map \(M^i\), which enables visualization of input regions influencing the model decision.
LRP for semantic segmentation
In order to apply LRP to semantic segmentation models, we cast the segmentation problem as a voxelwise classification. This means that in order to explain a decision of a segmentation model for a given output region \(A\), we propose to compute the input relevance maps according to Eq.1 for each considered output location \(a \in A\). Then the relevance map \(M^i\) explaining the model decision for class \(i\) in the region \(A\) can be calculated as:
\[M^i_A = \sum_{a \in A} \frac{M_a^i}{\sum M_a^i}\]We normalize \(M_a^i\) by its sum to ensure that each output location \(a\) equally contributes to the final relevance map \(M^i\).
Relevance analysis of liver tumor segmentation CNN
We train a 3D unet^{2} with a 6channel input and 2channel output to segment liver tumors in : T2, non contrast enhanced T1 (plainT1), and four dynamic contrast enhanced (DCE) T1 images acquired 20s (T120s), 60s (T160s), 120s (T1120s), and 15min (T115min) after contrast agent administration (GdEOBDTPA). All sequences were motion corrected using the T115min image as reference^{3}.
MRI sequence relevance
Normalized relevance distribution across input MRI sequences for 20 test patients denoted by different colors.
Pixellevel explanations
Foreground relevance maps computed for a true positive.
Foreground relevance maps computed for a false negative.
Discussion
 CNN used information from all MRI sequences.
 T115min sequence, which was used to create training labels was not the most important one.
 Similar relevance attribution for plain T1, T120s, T160s, and T1120s.
 Pixellevel explanations are hard to interpret.
References

S. Bach et al., On pixelwise explanations for nonlinear classifier decisions by layerwise relevance propagation. PLOS ONE, 2015. ↩

Ö. Çiçek et al. 3d unet: learning dense volumetric segmentation from sparse annotation. MICCAI, 2016. ↩

J. Strehlow et al. Landmarkbased evaluation of a deformable motion correction for dcemri of the liver. IJCARS, 2018. ↩